The Role
- Be an integral member on the team responsible for design, implement and maintain distributed big data capable system with high quality components (Kafka, EMR + Spark, Akka, etc).
- Embrace the challenge of dealing with big data on a daily basis (Kafka, RDS, Redshift, S3, Athena, Hadoop/Hbase), perform data ETL, and build tools for proper data ingestion from multiple data sources.
- Collaborate closely with data infrastructure engineers and data analysts across different teams, find bottlenecks and solve the problem
- Design, implement and maintain the heterogeneous data processing platform to automate the execution and management of data-related jobs and pipelines
- Implement automated data workflow in collaboration with data analysts, continue to improve, maintain and improve system in line with growth
- Collaborate with Software Engineers on application events, and ensuring right data can be extracted
- Contribute to resources management for computation and capacity planning
- Diving deep into code and constantly innovating
Requirements
- Experience with AWS data technologies (EC2, EMR, S3, Redshift, ECS, Data Pipeline, etc) and infrastructure.
- Working knowledge in big data frameworks such as Apache Spark, Kafka, Zookeeper, Hadoop, Flink, Storm, etc
- Rich experience with Linux and database systems
- Experience with relational and NoSQL database, query optimization, and data modeling
- Familiar with one or more of the following: Scala/Java, SQL, Python, Shell, Golang, R, etc
- Experience with container technologies (Docker, k8s), Agile development, DevOps and CI tools.
- Excellent problem solving skills
- Excellent verbal and written communication skills
by via developer jobs - Stack Overflow
No comments:
Post a Comment