Our client is recognized as a global leader in interactive and digital entertainment, with a commitment to delivering superior gaming experiences. Their business division has locations in San Diego, San Francisco, London, and Tokyo. Everyone is committed to delivering an industry-leading, enhanced gaming experience built on imagination, creativity, and the teams profound passion for gaming. Be a part of a company that thrives on the cutting edge of technology and join them in shaping the future of interactive entertainment.
As a Senior Site Reliability Engineer and member of the Commerce Platform Operations Team you will closely support engineering teams in the provisioning, integration, configuration, deployment, monitoring, and incident response of the services at the core of the PlayStation Network handling millions of users, devices and financial transactions. The Commerce Platform Operations team handles application deployments, configuration, performance tuning, and monitoring, capacity management, and production support for services which enable customers to access and enjoy a wide range of digital entertainment content seamlessly and across various devices and user interfaces. The Senior Site Reliability Engineer will support the team and drive improvements in process and technology of cloud services to improve continuous delivery, incident response, application availability, system resiliency and service monitoring.
Responsibilities:
The Senior Site Reliability Engineer will provide technical leadership to the Commerce Platform Operations team as we configure, integrate, deploy, validate, monitor, and support services and applications on the PlayStation Network. Responsibilities include:
- Hands-on application management and support for AWS cloud environments, including full-stack diagnosis, fault resolution, and root cause analysis.
- Proactive monitoring of production systems and identify issues before service impact.
- Drive and Implement monitoring tools/metrics/reports for tracking application/service performance.
- Collaborate with engineering and system teams to drive changes and ensure optimal application performance and resiliency.
- Lead service and system performance analysis, service capacity planning, and service continuity validation for multiple applications.
- Identify areas for process automation and develop automated scripts/tools for regular operational activities.
- Review and influence design, architecture, standards, and methods for deploying, monitoring and operating services and applications.
- Actively participate and/or commit in the execution of tasks required to meet milestones and deliverables set by the SCRUM team throughout the release cycle.
- Provide rotational on-call support.
Qualifications:
- BS degree in Computer Science, Engineering, or related technical discipline.
- 5 years hands-on Linux experience
- 3 years of relevant work experience in a high-volume and/or critical production environment.
- 2 years hands-on AWS experience Deploying, Supporting, and managing applications
- Proficient in using the typical Linux toolbox of open source software and management tools.
- Experience with log management tools, e.g. Splunk
- Exceptional scripting skills (python, shell, go).
- Hands-on experience in troubleshooting and performance tuning of Java applications.
- Solid understanding of networking systems and protocols HTTP, TCP/IP, SSL, DNS.
- Experience with automation/configuration management (ie: Jenkins)
- Experience with agile SCRUM development methodologies, Continuous Integration and Continuous Delivery (CI/CD).
- Experience in quality control and validating services in a production environment.
Staff Smart, Inc. is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
by via developer jobs - Stack Overflow
No comments:
Post a Comment