mGage is looking for a highly skilled technical director to provide leadership to the Companys 7 X 24 Technology Operations Team, which includes Site Reliability Engineering, Database Administration, and Network Engineering. The Technology Operations Team is responsible for full scope system/network availability and pro-active issue resolution for the Companys platform sites around the world. This position will be integral to the remote monitoring & management of global client infrastructure while providing timely response within designated SLA times to service effecting faults and performance issues. The director will work closely with the Companys Network Operations Center to diagnose & characterize issues and with the Companys Engineering Teams to provide architecture review and to develop infrastructure best practices. The Director of Technical Operations will be driven to build highly scalable, fault-tolerant, and easy to administer infrastructure in AWS and on-premise. You must be pro-active and organized, diligent about documentation, and passionate about monitoring and automating.
Responsibilities:
- Direct team of systems engineers to support and maintenance of our 24x7 SaaS customer solution.
- Monitor systems, databases, and networks for proper operation and performance.
- Provide 7×24 on call support for the operations infrastructure.
- Establish configurations for the applications operating environment, including computer hardware, storage, software and configuration necessary to properly host our applications.
- Assure our technical staff is current and trained on the latest technologies as well as new releases of third party software required for our products.
- Establish standard processes for diagnosing issues, tracking status and escalating issues within and outside the group.
- Comfortable functioning as a hands-on contributor as needed while effectively delegating and managing a team of senior engineers.
- Improve processes to reduce support effort and increase product availability and scalability.
- Establish operational objectives with proper controls for management visibility of performance
- Establish and assure adherence to budgets, schedules, work plans and performance requirements.
- Implement and cultivate a team environment that focuses on availability, service levels, customer satisfaction, and productivity.
- Mentor staff and assist team members in meeting their individual goals.
- Lead technical projects and troubleshoot complex systems.
- Overall responsibility for Tier2 and Tier3 7x24x365 escalation with schedule flexibility to meet the needs and demands of the business.
Qualifications:
- Experience with capacity planning, performance tuning, and complex infrastructure architectures. Experience scaling web, application, and data systems horizontally and vertically. Understanding of network planning including subnets, DHCP, DNS and Active Directory. Knowledge of systems hardware planning.
- Experience with VMWARE and virtual infrastructure platforms is required.
- Knowledge of Cassandra, RabbitMQ, and Cassandra concepts, installation, tuning and monitoring.
- Experience with J2EE platforms, specifically JBoss. Knowledge of JBoss installation, JVM tuning and troubleshooting.
- Firm understanding of configuration management and automation.
- Understand configuration and maintenance of switched networks including VLANs, Layer3 switch routing, link aggregation and general troubleshooting. Good understanding of network security including installation, general configuration, Site2Site IPSEC VPN, NAT/PAT, access lists and failover.
- Experience with load balancer concepts including HA, VIPs, and SNAT. Fundamental knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems.
- Five years of supervisory or managerial experience with proven ability to engage, motivate, evaluate and mentor team members.
Required Knowledge, Skills and Abilities:
- Demonstrated leadership when leading projects and motivating others.
- Very strong technical troubleshooting and analytical skills with the ability to resolve infrastructure (cloud) and application issues in a Production environment
- Manage multiple simultaneous and diverse technology issues to resolution, with minimal supervision.
- Strong understanding of application protocols and standards HTTP/SMTP/SSL/DNS/DHCP
- Strong understanding of TCP/IP and networking concepts
- Strong understanding of UNIX/LINUX concepts, scripts and commands
- Strong analytical and problem solving skills
- Experience in the development of operational procedures, processes and scripts
- Unix/Linux scripting experience writing functions and scripts
- Ability to plan, install, build, manage, support, configure and test infrastructure components and connectivity
- Previous experience with Ansible, VMWare, AWS, MySQL, Oracle, Cisco UCS, ElasticSearch, Graylog, Nimble, F5 LTM storage a plus
- Experience working with SMS and MMS communication protocol technology
Education & Experience:
- Bachelors Degree in Information Technology, Computer Science, or related fields.
- Experience in operating 7x24 high-availability infrastructure in a highly transactional environment.
- Operations experience in a managed service or telecom environment.
- 2-3 years troubleshooting network elements, protocols, services, and transport layer problems.
- 3-5 years in a Tier II or higher role
by via developer jobs - Stack Overflow
No comments:
Post a Comment