Site Reliability Engineer (SRE) at Evernote (Austin, TX)

Our SRE team is responsible for the overall performance and reliability of our Evernotes service and products. This includes over 200 million passionate and engaged users around the world, with billions of notes and files. We are looking for a Site Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users.

We participate in all aspects of running our platform at scale, focusing on both the service as it runs today and ensuring we can deliver new and exciting features rapidly to users. We have a real passion for automation and we continually seek to improve. We hand-in-hand with product teams to help them ship production-ready services and get new features in our users hands. We use Service Level Objectives (SLOs) based on Key Performance Indicators (KPIs) for each of our services and use them to allow us to move quickly while maintaining the quality service our users expect.

What youll do

Work closely with engineering teams to maintain and scale our existing production platform

Help us evolve what it means to be an SRE at Evernote

Evolve and implement production readiness standards for new services

Champion our SLOs and look to continuously improve them

Develop and maintain automation to reduce operations toil for the team

Participate in an on-call rotation for our production services

What were looking for

You possess a contagious sense of ownership and the tenacity to always find a way

You focus on quality to build manageable, scalable, and maintainable systems

You know that perfection is the enemy of done and when to make trade-offs

You emphasize the importance of making decisions based on data

You enjoy solving tough technical problems

You exercise judgement in a way which reduces risks

You share enthusiastically to reduce disconnects and communication breakdowns

You always want to understand the why in order to better see patterns and improve quality

What youve done

You know Linux systems like the back of your hand

Youve managed production environments at scale in a public cloud environment (AWS or GCP)

You have a strong familiarity with web applications including MySQL, Java, Apache

Youve attained a deep understanding of networking protocols (e.g. TCP/IP, HTTP, DNS, etc)

Youve implemented and used third-party metrics and monitoring platforms such as DataDog and PagerDuty

You possess the ability to wrangle problems quickly using the tools available at your disposal

Youve used configuration management and orchestration tools and you understand why theyre important

Youve built extensible and maintainable automation (Shell, Python, or Go preferred)

Youve run containerized microservices using Kubernetes

Skills that are particularly meaningful to us

Google Cloud Platform: GLB, PubSub, Spanner, GCS, App Engine, and GKE

Monitoring: PagerDuty, DataDog, Splunk

Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform

Infrastructure: HAProxy, Envoy, ElasticSearch, Consul

Languages/Libraries: GO, Python, Java, Thrift, gRPC

by via developer jobs - Stack Overflow

Placement papers | Freshers Walkin | Jobs daily

Labels

Search jobs and placement papers

Site Reliability Engineer (SRE) at Evernote (Austin, TX)

No comments:

Post a Comment

OpenSeeSame - TCS touchstone Mock written test