Data Engineering Lead at Migo (Daan District, Taiwan)

Responsibilities

Design, develop, document, and test advanced data systems that bring together data from disparate sources, making it available to data scientists, analysts, and other users using scripting and/or programming languages (Python, Java, etc)
Design, develop, implement and scale data processing pipelines and data schemas according to business needs.
Write and refine code to ensure performance and reliability of extracting data from multiple sources, integrating disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes.; debug data pipeline and ensure the timely delivery of application;
Manage deployment/data migration of the platform on public clouds and private clouds. Familiar with CI/CD tools, such as Jenkins and workflow management tools, such as Oozie, Luigi or airflow.
Evaluate structured and unstructured datasets utilizing statistics, data mining, and predictive analytics to gain additional business insights.
Independently initiates and drives projects; communicate data warehouse plans to internal stakeholders.
Recommend process improvements to increase efficiency and reliability in ETL development.
Constantly updates knowledge by tracking and understanding emerging data pipeline practices and solutions.
Provide guidance and mentoring to less experienced team members

Qualifications

8 years hands-on experience in the data warehouse space, custom ETL design, implementation and maintenance.
3-5 years of experience building and operating large scale distributed systems or applications
Deep domain knowledge on data pipeline for both real-time and batch-processing data applications in Big Data and Machine Learning Platform. Familiar with following big data and distributed computing technology (e.g. Hive, Spark, Presto, Parquet, Cassandra,…) and production level data lake solutions (e.g. Hadoop Cluster or AWS S3).
Experience with design and implementing real-time monitoring/alerting systems with Splunk or ELK stack.
Experience with batch processing and streaming data pipeline/architecture design patterns such as lambda architecture or kappa architecture.
Experience in SQL or similar languages and development experience in at least one scripting language (Python preferred).
Strong data architecture, data modeling, schema design and effective project management skills.
Experience with large data sets and data profiling techniques.
Excellent communication skills and proven experience in leading data driven projects from definition through interpretation and execution.
Experience with CI/CD tools, such as Jenkins
Experience with workflow management tools, such as Oozie, Luigi or airflow
Bachelor's degree in Computer Science, Information Management or related field.

by via developer jobs - Stack Overflow

Placement papers | Freshers Walkin | Jobs daily