Staff Reliability Engineer
5+ years experience
3-6 month Contract to Hire
Austin, TX 78753
Business Objective:
Youll be a part of a SRE team for our Fortune 25 client, supporting and monitoring the health of the Supply Chain organization. As the Staff Reliability Engineer, you will be responsible for scaling some of the largest software products in Retail by automating the application infrastructure, deployment, and monitoring of those products in production. You will also be part of a 24x7 on-call team that will lead the triage of incidents for your products using your expertise to mitigate the problem as soon as possible. Our clients "own what you build" mentality empowers you to make decisions quickly to deliver reliability improvements without the red tape that typically surrounds enterprise environments. Our clients Reliability Engineering motto is: Enable Speed with High Availability.
Daily responsibilities are:
70% - Delivery & Execution:
- Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions
- Works with Product Team to ensure user stories that are developer-ready, easy to understand, and testable
- Writes custom code or scripts to automate infrastructure, monitoring services, and test cases
- Writes custom code or scripts to do "destructive testing" to ensure adequate resiliency in production
- Configures commercial off the shelf solutions to align with evolving business needs
- Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
20% - Support & Enablement:
- Fields questions from other product teams or support teams
- Monitors tools and participates in conversations to encourage collaboration across product teams
- Provides application support for software running in production
- Proactively monitors production Service Level Objectives for products
- Proactively reviews the Performance and Capacity of all aspects of production: code, infrastructure, data, and message processing
10% - Learning:
- Participates in learning activities around modern software design and development core practices (communities of practice)
- Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
Qualifications:
- Proficient in production monitoring concepts and implementation including synthetic, real user, application performance, system, log, time-series, and dashboarding. Includes tools like AppDynamics, Dynatrace, newrelic, splunk, grafana, ELK, etc.
- Proficient in production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and Security
- Proficient in a modern scripting language (preferably Python)
- Proficient in establishing service level objectives, implementing monitoring and alerting, and building application resilience.
- Proficient in modern automation and deployment methods such as canary deployments and launch readiness preparation.
- Proficient in a modern infrastructure automation toolkit such as Puppet or Chef
- Proficient in a Linux or Unix based environment
- Deep understanding of modern microservice based architectures and operations
- Experience in destructive testing methodologies and tools such as chaos monkey
- Experience in CI/CD automation
- Experience in version control systems such as Git or SVN
- Experience in a cloud computing platform and the associated automation patterns it provides
- Experience in defensive coding practices and patterns for high-availability
by via developer jobs - Stack Overflow
No comments:
Post a Comment