Cookpad is looking for engineers to join our Site Reliability Engineering team.
In Cookpad, Site Reliability Engineers (SRE's) are a hybrid between system engineers and software engineers who are responsible for and who take ownership of reliability, automation, and scalability.
As a software engineer, we mainly use Ruby to automate routine works and build applications which improve scalability and availability. As a system engineer, we manage our servers on AWS to provide service to 68 countries. Configurations and conditions are managed in Ruby.
We flexibly introduce new technology stacks to evolve our infrastructure. For example, currently we use AWS ECS as a container application orchestrator but we also evaluate Kubernetes as our next container application orchestrator and plan to deploy Machine learning products with Kubernetes. We are introducing Prometheus instead of Zabbix to achieve more precise metric collection and scalability.
We also work closely with engineers to advocate sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. In the case of incidents, you will triage, mitigate and solve them with product team engineers.
Challenges and technology stacks for SRE change in order to fulfill business goals. As engineers, we believe our activities not only give us great technical challenges but also deliver real value to the world.
Responsiblities:
- Build highly available, performant and scalable service infrastructure with AWS and software
- Design, develop and implement software that improves the stability, scalability, availability and latency of Cookpad.
- Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems
- Undertake measured, methodical, troubleshooting of complicated systems under pressure
Essential skills & experience
- 3+ years SRE/DevOps experience in a Linux based AWS environment
- 2+ years experience with working professionally with Ruby on Rails
- Strong written communication skills in English and develop working relationships with coworkers in locations around the globe
- Fundamentals of TCP/IP(OSI) model and network architectures
- Strong coding skills in at least one programming language. Cookpad server side engineers work primarily in Ruby, with smatterings of shell script, Go, and Python
- Familiar with configuration management software such as Puppet and Chef
- Possess a passion for solving problems using open source software
Preferred skills
- Solid foundation in deployment and management for large scale of Linux systems
- Understand large-scale complex systems from a reliability perspective
- Solid competency with SQL (ideally in a federated database environment; MySQL a plus)
- Contributions to open source
- Deep network analysis experience is a plus
- Strong Linux system-level analysis capabilities (Ubuntu a plus)
- Knowledge and experiences about highly available and scalable architectures for services expanded in multi-regions is a big plus
by via developer jobs - Stack Overflow
No comments:
Post a Comment