Site Reliability Engineer (SRE), Windows Generalist at Stack Overflow (New York, NY)

Come join the SRE team at Stack Overflow! As one of the top 50 websites by traffic volume worldwide, we hit some unique challenges. Recently we’ve launched Stack Overflow for Enterprise and Stack Overflow for Teams, allowing organizations to have a private experience on the platform they already know and love. The success of these new products requires us to rethink our infrastructure strategy for supporting on-prem, cloud, and remote deployments.

We’re looking for someone with Windows Server experience (3+ years), and experience with managing cloud resources is a plus. You’ll join our team of SREs and devs and continue driving and improving our systems automation efforts and managing Windows-based services. We don’t expect you to know everything about all of the technologies we use, so you’ll work with other members of the team to learn and develop your skills.

As an SRE, you’ll bring a developer mindset to system administration, always looking for ways to automate manual work and create repeatable, scalable systems and processes. We are wiki-centric and prefer to document and automate in small increments as we work.

While we are a remote-first team with team members all over the world, this position will have occasional datacenter work requirements, which means 1-hour travel time to the Jersey City, NJ datacenter is a requirement.

What you’ll do:

Maintain the services and infrastructure platform used by the Stack Overflow websites.
Help us scale traffic from 6,000 hits/sec to twice that next year.
Be part of our on-call rotation (approximately 1 week out of 5), we get paged rarely.
Be responsible for the maintenance and upkeep of our Jersey City datacenter infrastructure--typically this means coordinating vendors and remote hands, but sometimes requires physical presence for larger-scale projects.
Act as a subject matter expert around our Windows infrastructure and automation
Work iteratively to scope and deliver large projects

Technologies you’ll work with:

Windows Server 2012 R2 and 2016; Linux CentOS 7 and Alpine
PowerShell / Go / Bash / Some C#
Github Enterprise, TeamCity (CI)
Puppet, some Ansible
Haproxy, Redis, Elasticsearch
Dell Servers and EqualLogic storage
Fortinet and Cisco Routers, ASAs, and Switches, HSRP / Keepalived / BGP
IIS, DFS, Multi-site AD, SQL Server 2017
Future: Containers and Kubernetes for both on-prem and cloud infrastructure

Some projects that we've recently completed or are working on:

Improving infrastructure automation around our Windows and Linux servers
Creating a secure replica of our infrastructure for storing private Q&A data
Reinventing how DNS is managed
Implementing autonomous OS upgrades for both Windows and Linux servers
Upgrading hardware with zero downtime across a variety of services
Improving how we monitor service internals
Migrating to a new CDN

Skills & Requirements

We’re looking for:

In-depth experience in Windows and comfortable working in Linux
Basic understanding of networking: the HTTP protocol, how load balancers work, IP addressing. (We use HAProxy, Fast.ly/Varnish, IIS)
Experience working hands-on with computer hardware
Experience with a configuration management systems or Infrastructure as Code (we use Puppet and Ansible)
A track record of taking on challenges and delivering thorough, stable, and maintainable systems
Strong written communication skills and a strong inclination to “document as you go”
Experience with Microsoft SQLServer administration and query tuning

Not required, but please let us know if you have experience with:

Experience with Dell OME (or other firmware management system)
Experience with network device administration
Experience with TeamCity, Jenkins, OctoDeploy, or other CI systems
HBase system administration
Experience in security, or have worked in a SOC or PCI environment
Experience with Azure or other cloud environments
Experience with some of the other technologies we use: ElasticSearch, Redis, Haproxy, Puppet, VMware, TeamCity, DSC, IIS and SSL cert management
Involvement with open source projects

When you apply... Please include an up-to-date resume. We also strongly encourage you to include a cover letter explaining why you’re interested in working at Stack Overflow.

What you’ll get in return:

Flexible hours
20 days paid vacation + holidays
Completely free health insurance - no copay, no premiums
Generous parental leave (10-16 weeks at 100% pay), family care leave, and unlimited sick days
Employees will never be poked with a sharp stick

If you want to work in our office… You’ll get your own private office in our headquarters in New York City, and enjoy additional benefits like free lunch every day prepared by our own in-house chefs, transportation reimbursement, and all the espresso you can drink.

If you want to work remotely (within 1 hour travel time to Jersey City)… We’ll help you set up a great home office, with an ergonomic chair, standing desk, and any other equipment you need to do your job.

by via developer jobs - Stack Overflow

Placement papers | Freshers Walkin | Jobs daily

Labels

Search jobs and placement papers

Site Reliability Engineer (SRE), Windows Generalist at Stack Overflow (New York, NY)

No comments:

Post a Comment

OpenSeeSame - TCS touchstone Mock written test