Our Global Technology Infrastructure (GTI) group is a team of innovators who love technology as much as you do. Together, youll use a disciplined, innovative and a business focused approach to develop a wide variety of high-quality products and solutions. Youll work in a stable, resilient and secure operating environment where youand the products you deliverwill thrive.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This position is for a Site Reliability Engineer responsible for the development and implementation of processes necessary to improve application / system reliability along with operational support. The position requires approximately equal amounts of focus on both software development and Infrastructure operation disciplines. This position will also develop software to automate operational processes along with coding for the shared engineering backlog deliverables.
Responsibilities:
- Engage with the development team throughout the lifecycle to help build for reliability
- Develop software to automate manual operational work
- Run, maintain and improve the service against established Service Level Objectives by applying software engineering principles
- Responsible for the availability, performance, change management, monitoring, and capacity management of their services
- Troubleshoot priority incidents, conduct post-mortems and ensure permanent closure of the incidents
- Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
- Facilitate maximum speed of delivery by objectively binding to error budgets of the service.
- Manage efforts that are split between manual operational and engineering work
- Participate in a shift model covering 24x7x365 support
- Bachelor's degree (or equivalent experience) in Computer Science/Engineering
- 6+ years of experience in building enterprise software and proficiency in multiple languages preferably Java, Python, Shell scripting
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems under pressure
- Good working knowledge of Cloud Engineering including an understanding of private cloud principles and exposure to public cloud offerings such as AWS, Azure, GCP or similar technology
- 3+ years of experience in performance engineering and monitoring using tools such as AppDynamics, Splunk, Apica, and JMeter
- Experience with Configuration Management tools like Ansible / Puppet / Chef / Powershell
- 2+ years of incident resolution experience in a large-scale operations environment
- Experience/knowledge administering application servers, web servers, and databases (Tomcat, WebSphere, NGINX, Microsoft IIS, Oracle, MySQL, etc.)
by via developer jobs - Stack Overflow
No comments:
Post a Comment