Placement papers | Freshers Walkin | Jobs daily: Site Reliability Engineer at SafetyCulture (Surry Hills, Australia)


Search jobs and placement papers

Site Reliability Engineer at SafetyCulture (Surry Hills, Australia)

At SafetyCulture, we build awesome products that help our customers to drive change and create a safe and efficient workplace. SafetyCulture's iAuditor is the most loved checklist inspection app in the world, with over 1 million inspections performed every month. As a Site Reliability Engineer at SafetyCulture, you will help to design, build and run resilient systems. You live and die by Murphy’s Law, knowing that anything that can go wrong will go wrong at the worst possible moment. You will help to foster a culture of designing for, and expecting failure in production systems - a culture where learning and knowledge-sharing is expected. You love to solve sticky cross-service, or cross-domain problems and have an innate ability to identify root causes in distributed systems. Most importantly you are a team-player, are excited about the prospect of working in a fast-paced demanding environment and get that learning happens at the edge of the comfort zone.


At SafetyCulture we work in cross-functional feature teams. In this role you will work across 2-3 helping solve challenges across a specific tribe.



You will thrive in this role if you have the following:



  • Fluency in at least one modern programming language

  • An ability to wrangle with infrastructure tooling, and get why infrastructure-as-code is mandatory

  • Expert level understanding of observability, alerting and alarming best practice

  • Excellent human-handling-skills with an ability to build and maintain healthy cross-team relationships

  • You balance your love of systems-engineering with a product-mindset and build empathy with your customers and your product-engineering colleagues



You’ll be responsible for:



  • Defining and driving a Chaos Engineering Strategy

  • Coaching and educating more junior team members on systems reliability and fault-tolerance best practice

  • Identifying gaps in existing systems and coming up with remediation plans

  • Improving metrics such as MTTR and MTTF

  • Promoting a culture of sustainable incident response and blameless post-mortem 

    https://slides.com/terlyahunt/safetyculture?token=Td5II4KT#/


by via developer jobs - Stack Overflow
 

No comments:

Post a Comment