Our SRE team is responsible for the overall performance and reliability of our Evernotes service and products. This includes over 200 million passionate and engaged users around the world, with billions of notes and files. We are looking for a Site Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users.
We participate in all aspects of running our platform at scale, focusing on both the service as it runs today and ensuring we can deliver new and exciting features rapidly to users. We have a real passion for automation and we continually seek to improve. We hand-in-hand with product teams to help them ship production-ready services and get new features in our users hands. We use Service Level Objectives (SLOs) based on Key Performance Indicators (KPIs) for each of our services and use them to allow us to move quickly while maintaining the quality service our users expect.
- Work closely with engineering teams to maintain and scale our existing production platform
- Help us evolve what it means to be an SRE at Evernote
- Evolve and implement production readiness standards for new services
- Champion our SLOs and look to continuously improve them
- Develop and maintain automation to reduce operations toil for the team
- Participate in an on-call rotation for our production services
- You possess a contagious sense of ownership and the tenacity to always find a way
- You focus on quality to build manageable, scalable, and maintainable systems
- You know that perfection is the enemy of done and when to make trade-offs
- You emphasize the importance of making decisions based on data
- You enjoy solving tough technical problems
- You exercise judgement in a way which reduces risks
- You share enthusiastically to reduce disconnects and communication breakdowns
- You always want to understand the why in order to better see patterns and improve quality
- You know Linux systems like the back of your hand
- Youve managed production environments at scale in a public cloud environment (AWS or GCP)
- You have a strong familiarity with web applications including MySQL, Java, Apache
- Youve attained a deep understanding of networking protocols (e.g. TCP/IP, HTTP, DNS, etc)
- Youve implemented and used third-party metrics and monitoring platforms such as DataDog and PagerDuty
- You possess the ability to wrangle problems quickly using the tools available at your disposal
- Youve used configuration management and orchestration tools and you understand why theyre important
- Youve built extensible and maintainable automation (Shell, Python, or Go preferred)
- Youve run containerized microservices using Kubernetes
- Google Cloud Platform: GLB, PubSub, Spanner, GCS, App Engine, and GKE
- Monitoring: PagerDuty, DataDog, Splunk
- Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform
- Infrastructure: HAProxy, Envoy, ElasticSearch, Consul
- Languages/Libraries: GO, Python, Java, Thrift, gRPC
by via developer jobs - Stack Overflow
No comments:
Post a Comment