A data architect builds the foundation of Wayfair's data ecosystem, partnering across engineering and business to ensure the timeliness, cleanliness and availability of the data behind Wayfair's reporting and data science. They are the experts on a given area of data, covering the range from how it is generated in an application to how to move, transform, and extract key business insights from it. They are comfortable working with application engineers to structure data in a way that facilitates reporting, as well as determining the appropriate location of data cleaning, transformation, and integrity checks. These data experts pair deep data modeling expertise with business understanding and project management to make answering key business questions as streamlined as possible.
What You'll Do
- Own an entire key area of data within Wayfair from generation to consumption, such as a trillion row clickstream dataset or product information from hundreds of suppliers
- Build, schedule, and manage data movement from application origin through batch and streaming systems to make it available for key business decisions
- Develop a robust, sustainable plan for the data area going forward, including projecting space requirements, procuring technology, and partnering with engineering on improvements to the data
- Ensure data products are aligned with the rapidly evolving needs of a multi-billion dollar business
- Provide consulting to engineering and data engineering organizations on best practices for designing applications to enable easy analytics; be an expert on large-scale data engineering
What You Are
- A true expert on big data, comfortable working with datasets of varying latencies and size and disparate platforms
- Excited about unlocking the valuable data hidden in inaccessible raw tables and logs
- Attentive to detail and with a relentless focus on accuracy
- Excited to collaborate with partners in business reporting and engineering to determine the source of truth of key business metrics
- Familiarity with distributed data storage systems and the tradeoffs inherent in each one
What You Have
- Hands on time with advanced SQL including writing complex, multi-stage transformations, user-defined functions, stored procedures, and tuning query performance. Experience with Hive or a distributed database system.
- Experience scheduling, structuring, and owning data transformation jobs that span multiple systems and have high requirements for volume handled, duration, or timing
- Prior projects working with optimizing storage and access of high volume heterogeneous data with distributed systems such as Hadoop, including familiarity with various data storage mediums and the tradeoffs of each
- Experience with agile development, source control systems, continuous integration and other standards of modern development
- Bachelors or Masters in Computer Science, Computer Engineering, Analytics, Mathematics, Statistics, Information Systems, Economics, Management or other quantitative discipline field with strong academic record.
- Ecommerce or retail analytics experience a strong plus
What we'd love to see (but isn't required)
- Previous work with a distributed database system like Vertica or Redshift
- Previous experience with Hadoop and associated query systems like Presto or Hive
by via developer jobs - Stack Overflow
No comments:
Post a Comment