Site Reliability Engineer (SRE)

Job description

We are looking for a problem-solver, results-driven, and passionate engineers to join the infrastructure team. The ideal candidate is a self-starter and has excellent communication skills. Our collaborative environment relies heavily on innovation, technical savvy, and problem-solving skills. This is a full-time remote position within Tehran. As the newest SRE Engineer, you’ll be a major contributor to the company’s success and you’ll have an opportunity to work alongside our wonderful SRE team that supports the Snapp platform. You will embrace the SRE model, and work with other senior leaders on the team to modernize the tech stack.


Responsibilities

  • Monitoring services
  • Incident Management
  • Extending and improving current monitoring systems
  • Automate the current monitoring process
  • Deploying services to the production environment
  • Communicating with other teams to resolve issues
  • Troubleshooting system problems

Requirements

  • Ability to work in 24/7 shift plan (Including holidays)
  • Have a good experience with Linux
  • Have a good experience with Grafana and Prometheus
  • Familiar with log shipping solutions (ELK, Loki, ...)
  • Problem-solving skills

Familiarity with these tools is a PLUS:

  • Docker and Kubernetes
  • CI/CD
  • REST API
  • Python
  • Microservice architecture