HealthEdge is looking for a motivated Site Reliability Engineer who will be responsible for enhancing the stability, availability, and scalability of the environments that host the HealthEdge award winning applications in our private cloud. As a technology leader within the organization, the Site Reliability Engineer will guide the engineering and operations teams to problem resolution, focusing on improving the software and supporting technology to drive availability, stability, scalability and customer satisfaction.
The supported environments are highly sophisticated and a successful candidate will have broad knowledge of technology and software including but not limited to networking, compute, storage, Java applications, middleware, database technologies, and monitoring tools. Desire to drive to root cause and initiative to enhance the environments to ensure continued stability and performance are a must for this candidate to be successful in this highly skilled role within HealthEdge.
This role is new to HealthEdge and you will help to define the strategy and how you want to grow within the department.
HealthEdge will support relocation for this role.
Responsibilities:
- Develops strong knowledge and expertise in technologies used to run the HealthRules cloud.
- Leads cross-functional teams to drive problem resolution and identify root cause.
- Applies system-thinking to problem solving situations, following a methodical incident resolution process.
- Makes logical and physical system architecture recommendations to avoid problems and increase system availability and stability.
- Maintains services in live customer environments including monitoring availability, performance, and system health.
- Scales processes and drives sustainability through automation.
- Hosts, documents, and encourages environment of blameless post-mortems in response to system incident resolution.
Requirements/Qualifications:
- BS degree in Computer Science or a related technical filed involving systems engineering, or related past experience.
- Expertise in designing, troubleshooting, and scaling complex distributed systems.
- Experience with Ansible or related IT automation framework.
- Experience with computer networking and storage technologies, including direct experience with VMware-based environments.
- Experience with Linux and Windows operating system internals, administration, and networking.
- Experience designing and troubleshooting databases, Java applications, Weblogic services, and load balancers.
- Ability to clearly communicate technical information within the organization and with customers.
- Ability to provide occasional off-hours support and for on-call rotation (24/7 Support).
by via developer jobs - Stack Overflow
No comments:
Post a Comment