Comcast brings together the best in media and technology. We drive innovation to create the world's best entertainment and online experiences. As a Fortune 50 leader, we set the pace in a variety of innovative and fascinating businesses and create career opportunities across a wide range of locations and disciplines. We are at the forefront of change and move at an amazing pace, thanks to our remarkable people, who bring cutting-edge products and services to life for millions of customers every day. If you share in our passion for teamwork, our vision to revolutionize industries and our goal to lead the future in media and technology, we want you to fast-forward your career at Comcast.
Job Summary:
The resource is a member of the Residential Reliability Engineering Support Team responsible for developing and maintaining standard operating procedures (SOP's) specific to our Xfinity Home product. The Incident Manager will ensure that all incidents are identified, triaged and resolved within the Service Level Agreement. Additionally, this position will be responsible for ensuring that all root cause analysis is promptly and properly documented for high severity incidents and delivered to the respective Product owners. This position will interface with Comcast Product, Change, Problem, Release, Engineering, Marketing and Operations Management teams.
Core Responsibilities:
- Lead technical investigation and triage of production issues; analyze logs, perform end-to-end investigation including but not limited to network, software and infrastructure issues
- Leads technical outage bridges and engages appropriate resources to drive issues to closure
- Document triage and training procedures (including enhancing existing procedures)for complex application workflows (including API's and endpoints)
- Draft engineering production support readiness documentation
- Actively manage relationship with key stakeholders, markets and resolver groups
- Respond to service-level issues and work to restore normal service operations as quickly as possible
- Develop procedures for incident triage and management, metric and measure creation, management and administration of monitoring tools
- Oversee the timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, and system monitoring and event log management
- Work with architecture, development and engineering teams to identify root cause for incidents and create an action plan for resolution
- Monitor systems and services for most efficient operation, identifying fault conditions as well as opportunities for further optimization
- Analyses problems in design, configuration, data flow, and data state within a highly complex multi-product provisioning system
- Assist in training and developing junior engineers and offshore resources
- Identify and lead the implementation of creative process and technology solutions within the team
- Provide mentorship and team development opportunities
- Assist in representing Production Support to the organization ensuring that high-availability and the ability to identify customer-facing issues is included in the development or deployment of new products and services.
- Identify and recommend opportunities for "clean-slate" process improvement with regards to incident management, fault monitoring, triage procedures and issue escalation
- Maintain escalation and contact lists for mission critical systems and services
- Consistent exercise of independent judgment and discretion in matters of significance
- Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedules(s) as necessary
Job Specification:
- Bachelor's degree or equivalent work experience is required.
- Generally requires 3 to 7 years of experience
- Strong understanding of ITIL and Incident and Problem Management experience.
- Experience defining, implementing, and monitoring IT service level processes.
- Experience in application development and engineering a plus
- An understanding of Cloud infrastructure (Network and Server architecture)
- Experience with monitoring technologies such as OIV, Splunk, Op5 and the Haystack tools is a plus
- Must be able to work nights and weekends as part of an after-hours on-call support schedule
Comcast is an EOE/Veterans/Disabled/LGBT employer
by via developer jobs - Stack Overflow
No comments:
Post a Comment