We are looking for an experienced Site Reliability Engineer to join our Operations Team. We provide tools and services to all teams in OfferUp for managing an increasingly complex production infrastructure in AWS. Our success is measured by our ability to allow everyone to stand up and deploy services quickly with no downtime. In this role, you will be at the forefront of driving and developing the technology that automates everything.
Responsibilities
- Work with other SREs to build a comprehensive set of tools to automate and monitor our production infrastructure
- Work with Engineering to build resilient, operable, self-healing services
- Participate in reasonable on-call rotations with the rest of Engineering
Experience
- Managed groups of servers, preferably in AWS, at scale
- Reasonably deep knowledge of Linux and internet technologies
- Proficient in modern scripting languages like Python or Ruby
- Configuration management tools like Ansible or Salt
- Used advanced metrics to solve hard problems
- Experience managing Big Data or high-throughput distributed systems like Hadoop and Kafka
Nice to have
- Experience with continuous integration
- Contribution to open source projects
- An active interest in containerization technologies such as Docker and/or Kubernetes
Our team:
- Acts like a team
- Avoids doing things twice
- Solves hard problems for tomorrow, not just for today
- Prefers fixing problems to complaining about them
- Investigates, considers and adopts new technology where it makes sense
- Doesn’t tolerate brilliant jerks
Come do work that matters. Join a team that believes when we all work together, we get more out of things.
by via developer jobs - Stack Overflow
 
No comments:
Post a Comment