Site Reliability Engineer (SRE), Linux Generalist at Stack Overflow (Denver, CO)

Come join the SRE team at Stack Overflow! As one of the top 50 websites by traffic volume worldwide, we hit some unique challenges. Recently we’ve launched Stack Overflow for Enterprise and Stack Overflow for Teams, allowing organizations to have a private experience on the platform they already know and love. The success of these new products requires us to rethink our infrastructure strategy for supporting on-prem, cloud, and remote deployments.

We’re looking for someone with Linux administration experience (3+ years), and experience with containerization and managing cloud resources is a plus. You’ll join our team of SREs and devs and continue driving and improving our systems automation efforts and managing Linux and container based services. We don’t expect you to know everything about all of the technologies we use, so you’ll work with other members of the team to learn and develop your skills.

As an SRE, you’ll bring a developer mindset to system administration, always looking for ways to automate manual work and create repeatable, scalable systems and processes. We are wiki-centric and prefer to document and automate in small increments as we work.

While we are a remote-first team with team members all over the world, this position will have occasional datacenter work requirements, which means proximity to the Denver, CO datacenter is a requirement. You’ll work primarily from home, only going into the datacenter a few times per month.

What you’ll do:

Maintain the services and infrastructure platform used by the Stack Overflow websites.
Help us scale traffic from 6,000 hits/sec to twice that next year
Be part of our on-call rotation (approximately 1 week out of 5), we get paged rarely
Be responsible for the maintenance and upkeep of our Denver datacenter infrastructure -- typically this means coordinating vendors and remote hands, but sometimes requires physical presence for larger-scale projects
Act as a subject matter expert around our Linux infrastructure and automation.
Work iteratively to scope and deliver large projects

Technologies you’ll work with:

Linux CentOS 7 and Alpine
Kubernetes (cluster administration and containerizing applications)
Go / Bash
Some Windows Server 2012 R2 and 2016, PowerShell and C#
Github Enterprise, TeamCity (CI)
Puppet, some Ansible
Haproxy, Redis, Elasticsearch
Dell Servers and EqualLogic storage
Fortinet and Cisco Routers, ASAs, and Switches, HSRP / Keepalived / BGP
IIS, DFS, Multi-site AD, SQL Server 2017

Some projects that we've recently completed or are working on:

Improving infrastructure automation around our Windows and Linux servers
Creating a secure replica of our infrastructure for storing private Q&A data
Reinventing how DNS is managed
Implementing autonomous OS upgrades for both Windows and Linux servers
Upgrading hardware with zero downtime across a variety of services
Improving how we monitor service internals
Migrating to a new CDN

Skills & Requirements

We’re looking for:

In-depth experience in Linux (and comfortable working with Windows)
Basic understanding of networking: the HTTP protocol, how load balancers work, IP addressing. (We use HAProxy, Fast.ly/Varnish, IIS)
Experience working hands-on with computer hardware
Experience with a configuration management systems or Infrastructure as Code (we use Puppet and Ansible)
A track record of taking on challenges and delivering thorough, stable, and maintainable systems
Strong written communication skills and a strong inclination to “document as you go”

Not required, but please let us know if you have experience with:

Experience with Dell OME (or other firmware management system)
Experience with network device administration
Experience with TeamCity, Jenkins, OctoDeploy, or other CI systems
HBase system administration
Experience in security, or have worked in a SOC or PCI environment
Experience with Azure or other cloud environments
Experience with some of the other technologies we use: ElasticSearch, Redis, Haproxy, Puppet, VMware, TeamCity, DSC, IIS and SSL cert management
Involvement with open source projects

When you apply... Please include an up-to-date resume. We also strongly encourage you to include a cover letter explaining why you’re interested in working at Stack Overflow.

What you’ll get in return:

Flexible hours
20 days paid vacation + holidays
Completely free health insurance - no copay, no premiums
Generous parental leave (10-16 weeks at 100% pay), family care leave, and unlimited sick days
Employees will never be poked with a sharp stick

When you work remotely (within 1 hour travel time to Denver, CO)… We’ll help you set up a great home office, with an ergonomic chair, standing desk, and any other equipment you need to do your job.

by via developer jobs - Stack Overflow

Placement papers | Freshers Walkin | Jobs daily

Labels

Search jobs and placement papers

Site Reliability Engineer (SRE), Linux Generalist at Stack Overflow (Denver, CO)

No comments:

Post a Comment

OpenSeeSame - TCS touchstone Mock written test