Data Engineer

Job description

About the Role

The role contributes to a variety of exciting projects ranging from designing robust and automated Data Pipelines and Storage processes to building tools for improving company-wide productivity with data. It’s more about designing, implementing, and operating stable, scalable, and efficient solutions to flow data from different sources into a data lake and other databases. You will work as a stakeholder to bring the data into standard/query-able format and empower the company to make data-driven decisions. Our Team empowers nearly all of Snapp Cab/Box to make data-driven decisions and make an impact throughout the company.


  • Develop and automate large scale, High-Performance, Scalable data pipelines (batch and streaming) to drive faster analytics
  • Ability to design new Data Architecture with excellent run-time characteristics such as low latency, fault-tolerance, and availability
  • Maintain and monitor Real-Time Analytics and Big Data Systems to make sure about their reliability and resolve issues
  • Collaborating with Business Intelligence team, Data Scientists team, Ventures and other teams to build data insights and help them to achieve their business goals
  • Design, implement and maintain scalable real-time and batch data pipelines handling billions of records
  • Maintain Real-Time Analytics Systems and Big Data Systems and make sure their reliability and maintenances
  • Setup Real-Time Analytics solution depending on services
  • Propose new data architecture for new requirements and fine-tune the existing ones
  • Monitoring data services and resolving issues in case of any incident


Basic Requirements

  • BS/MS or more in computer engineering/science or related experience
  • At least 2 years of programming experience in Python, Java or Scala, or Go
  • SQL Knowledge and Experience with database systems (Click house, MySQL, Postgres, and other DBs)
  • Specialized in Hadoop ecosystem (HDFS, Yarn, Hive, Spark)
  • Hands-on experience with Kafka, Zookeeper, Logstash Preferred Requirements
  • Experience working with one or more of these: Airflow, Debezium, Confluent Schema Registry
  • Familiar with monitoring systems like Grafana, Prometheus, Exporters
  • Experience in streaming technologies like Spark, Apache Flink, Nifi
  • Hands-on experience in Linux, Docker, Kubernetes, and Virtualization
  • Experience with data exploration and data visualization like Hue, Superset
  • Experienced in Agile, Scrum, DevOps projects
  • Good Communication and Teamwork Skills