Our millions of rides create an incredibly rich dataset that needs to be transformed, exposed and analyzed in order to improve our multiple products. By joining the Data Engineering team, you will be part of an early stage team who builds the data transport, collection and storage at Heetch. The team is quite new and you will have the opportunity to shape its direction while having a large impact. You will own Heetch's data platform by architecting, building, and launching highly scalable and reliable data pipelines that'll support our growing data processing and analytics needs. Your efforts will allow accessibility to incredible rich insights enlightening Data Analysts, Data Scientists, Operations managers, Product Managers and many others.
OUR ENGINEERING VALUES
• Move smart: we are data-driven, and employ tools and best practices to ship code quickly and safely (continuous integration, code review, automated testing, etc).
• Distribute knowledge: we want to scale our engineering team to a point where our contributions do not stop at the company code base. We believe in the Open Source culture and communication with the outside world.
• Leave code better than you found it: because we constantly raise the bar.
• Unity makes strength: moving people from A to B is not as easy as it sounds but, we always keep calm and support each other.
• Always improve: we value personal progress and want you to look back proudly on what you’ve done.
WHAT YOU WILL DO
You will:
• Build large-scale batch data pipelines.
• Build large-scale real-time data pipelines.
• Be responsible for scaling up data processing flow to meet the rapid data growth at Heetch.
• Consistently improve and make evolve data model & data schema based on business and engineering needs.
• Implement systems tracking data quality and consistency.
• Develop tools supporting self-service data pipeline management (ETL).
• Tune jobs to improve data processing performance. • Implement data and machine learning algorithms (A/B testing, Sessionization,).
REQUIREMENT
• At least 4+ years in Software Engineering.
• Extensive experience with Hadoop.
• Proficiency with Spark or other cluster-computing framework.
• Advanced SQL query competencies (queries, SQL Engine, advanced performance tuning).
• Strong skills in scripting language (Python, Go, Java, Scala, etc.).
• Familiar with NoSQL technologies such as Cassandra or other.
• Experience with workflow management tools (Airflow, Oozie, Azkaban, Luigi).
• Comfortable working directly with data analytics to bridge business requirements with data engineering.
• Strong mathematical background.
• Inventive and self-started.
Bonus points
• Experience with Kafka.
• MPP database experience (Redshift, Vertica…).
• Experience building data models for normalizing/standardizing varied datasets for machine learning/deep learning.
PERKS
• Stocks.
• Paid conference attendance/travel.
• Heetch credits.
• A Spotify subscription.
• Code retreats and company retreats.
• Travel budget (visit your remote co-workers and our offices).
by via developer jobs - Stack Overflow
No comments:
Post a Comment