Site Reliability Engineer

Job description

About Snapp

Snapp is the pioneer provider of ride-hailing mobile solutions in Iran that connects smartphone owners in need of a ride to Snapp drivers who use their private cars to offer transportation services. We are ambitious, passionate, engaged, and excited about pushing the boundaries of the transportation industry to new frontiers and being the first choice of each user in Iran.


About the Position
You’ll be responsible for monitoring general uptime and availability for all applications owned by Snapp. The SRE role is embedded within the cross-functional relationship with DevOps and Security team. This role also includes night shifts.
Shift: Day and Night Shift Rotation


Responsibilities

  • Monitoring services
  • Incident Management
  • Extending and improving current monitoring systems
  • Automate the current monitoring process
  • Deploying services to the production environment
  • Communicating with other teams to resolve issues
  • Troubleshooting system problems
  • improve monitoring systems
  • troubleshot system problems
  • deploy and change the production environment

Requirements

  • Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment
  • Work collaboratively with other SRE team members, developers, and stakeholders to address complex technical challenges and drive continuous improvement in system reliability and performance
  • Strong communication skills, both verbal and written
  • Familiar with Linux OS
  • Familiar with Grafana, ELK, Prometheus 
  • Familiar with Docker and Kubernetes
  • Familiar with CI/CD