*This is a contract role and will be open to those located in Brazil only*
Company Overview:
Talent Systems, LLC is the leading technology solution provider for casting and auditioning to the entertainment industry. Casting directors and agents worldwide use Talent Systems’ portfolio of products to source and manage talent across film, television, commercials, theater, and digital projects, powering an unparalleled, global casting software ecosystem.
We are headquartered in Los Angeles and operate in the US, Canada, Mexico, UK, Australia, and India. Our portfolio brands include Casting Networks, Spotlight, Cast It Systems, Staff Me Up, Tagmin, Casting Frontier, and Cast It Reach.
Job Purpose:
We are seeking an experienced Senior Manager, Engineering Operations to lead our engineering operations which includes areas such DevOps, Site Reliability Engineering (SRE), CI/CD, Release management etc for our cloud-based systems and applications. This role is pivotal in ensuring the reliability, security, scalability, and availability of our systems while driving innovation in automation, CI/CD pipelines, and operational efficiency. You will be responsible for crisis management, improving system performance, cost and fostering a culture of operational excellence.
Duties & Responsibilities:
Leadership & Strategy
- Lead and mentor teams in DevOps, SRE, and Engineering Operations, fostering a culture of collaboration, ownership, and innovation.
- Develop and execute the strategic roadmap for engineering operations, aligning with business goals and product requirements.
- Advocate for and implement industry best practices in system reliability, DevOps, and automation.
Reliability & Availability
- Drive initiatives to improve the reliability, availability, and performance of cloud-based applications and infrastructure.
- Establish performance measurements for various system health metrics.
- Ensure robust incident management and crisis response processes to minimize downtime and customer impact.
DevOps & CI/CD
- Oversee the design, implementation, and optimization of CI/CD pipelines to enable seamless and automated deployment processes.
- Leverage automation tools and practices to reduce manual interventions and improve operational efficiency.
- Collaborate with product and engineering teams to enable rapid and reliable feature delivery.
Monitoring & Observability
- Implement and maintain advanced monitoring, logging, and alerting systems to gain deep insights into system health and performance.
- Use observability tools (e.g., Grafana) to proactively identify and resolve issues before they impact customers.
Crisis & Incident Management
- Lead crisis management efforts during high-severity incidents, ensuring quick resolution and effective communication with stakeholders.
- Conduct root cause analyses and drive post-mortem reviews to identify and address operational gaps.
Team Development & Collaboration
- Build, grow, and retain a high-performing engineering operations team with expertise in DevOps and SRE practices across multiple geolocations.
- Foster close collaboration with development, data, and product teams to align engineering operations with overall business objectives.
- Promote a blameless post-mortem culture to encourage continuous learning and improvement.
Cost Optimization & Security
- Optimize cloud infrastructure costs while maintaining system reliability and scalability.
- Implement robust security practices in operations to ensure compliance with industry standards and regulations.
Qualifications & Attributes:
- 10+ years of experience in software engineering, with 5+ years in leadership roles
- Proven track record of improving system reliability, availability, and performance for cloud-based applications.
- Extensive experience with CI/CD pipelines and automation tools.
- Demonstrated expertise in crisis management and incident response in high-pressure environments.
- Deep knowledge of cloud platforms (such as AWS) and container orchestration tools (Kubernetes, Docker).
- Strong proficiency in monitoring and observability tools like Grafana.
- Excellent problem-solving and decision-making skills under pressure.
- Exceptional communication and collaboration skills, with the ability to influence stakeholders across engineering and business teams.
- Proven ability to lead and grow high-performing teams in a fast-paced environment.
- A strong focus on fostering a culture of accountability, learning, and operational excellence.
- Influence partner engineering teams like platform and product engineering.
*You will be paid monthly, in USD currency.