Cake (a BAMTECH Media company) are looking for a Lead SRE (Site Reliability Engineer) to join the team at our Reddish Head Office, leading a growing team of 3 SRE’s to drive reliability and performance across the whole BAMTECH organisation.
We are 9 months into our journey with BAMTECH Media and Disney to build some of the most advanced and high performing streaming global platforms, and the progress we have made has been significant. BAMTECH are already well-versed in live media streaming, (just look at the platforms for MLB, NHL and WWE to count but a few) and with the recent launch of EuroSport, the imminent launch of ESPN+, and the eventual launch of the Disney platform, optimisation and reliability will be an invaluable function within the business.
We launched our SRE function roughly 6 months ago, with 3 members of the Cake team moving internally from Developer and DevOps roles, and quickly identified that for the team to integrate across the BAMTECH org, we needed to take a consultative approach, rather than operate as a support function. When you have 30 different application teams that you could be supporting, there’s no better way to work. The team will quickly establish themselves within an application team, educate and nurture the team, improve aspects of reliability within the applications, and set that application team on a positive path.
We love bleeding edge technology, that’s why Cake became a Scala shop in 2011 and have been supporting the tech community through conferences, meetups and blogs ever since. Our current stack consists of, but not limited to, Linux, AWS, Scala, Go, Puppet, Docker, and Jenkins.
In this role, you can expect to:
- Continuously refine monitoring processes, thresholds, and configuration
- Work closely with product developers to ensure new features have the proper operational support and maintainability - provide deep technical guidance to development teams
- Help with designing, building and maintaining the cloud native platform needed to support our growth plans, we do that handling Infrastructure as code and automating as much as we can
- Mentoring and supporting team members on production readiness and best practices
- Develop software for the purposes of automating, monitoring and maintaining deployed infrastructure and services
- Handling high-severity internal or customer incidents, ensuring we meet all SLAs
- Help teams create and maintain documentation and runbooks/playbooks
- Participate in Scrum processes and ceremonies
- Respond to issues and escalations
- Participate in on-call rotation
For this role, we would like you to have:
- Track record of leading a team of Software or Systems Engineers
- Track record of working as a Site Reliability Engineer, DevOps Engineer, or a Software Engineer
- Must be able to code and learn coding in new languages
- Experience in at least one scripting language: Python, Ruby, Bash, Perl
- Experience in working with infrastructure as code tools such as Puppet, Chef, SaltStack, Ansible, CloudFormation, Terraform etc.
- Track record of working with Linux systems in production
- Experience in working with container technologies such as Docker
- Experience in working with cloud platforms such as AWS
It would be nice (but not essential) to have:
- Experience using Agile practices
- Experience with modern open source infrastructure services and concepts such as Redis, ElasticSearch, Kafka, and Docker
- Experience in software development in any language. Our focus languages are Go and Scala.
- Experience in working with any functional programming language such as Clojure, Haskell, or OCaml.
This is an opportunity to take over the newest department within the Cake/Bamtech org, and will give you scope to have influence over some of the largest streaming platforms in the market. On offer is a competitive salary (that will be discussed at interview), with a new benefits package created by Disney, which will be rolled out by the end of March 2018.
by via developer jobs - Stack Overflow
No comments:
Post a Comment