Site Reliability Engineer (SRE)

Job description

We are looking for experienced, results-driven, and passionate engineers to join the infrastructure team. The ideal candidate is a self-starter and has excellent communication skills. Our collaborative environment relies heavily on innovation, technical savvy, and problem-solving skills. This is a full-time remote position within the Tehran. As a newest SRE Engineer, you’ll be a major contributor to the company’s success and you’ll have an opportunity to work alongside our wonderful SRE team that supports the Snapp platform. You will embrace the SRE model, and work with other senior leaders on the team to modernize the tech stack.


Responsibilities

  • Monitoring services
  • Incident Management
  • Extending and improving current monitoring systems
  • Automate the current monitoring process
  • Deploying services to the production environment
  • Communicating with other teams to resolve issues
  • Troubleshooting system problems

Requirements

  • Ability to work in 24/7 shift plan
  • Have a good experience with Linux (LPIC-2)
  • Have a good experience with Grafana and Prometheus
  • Familiar with Python
  • Familiar with Kubernetes
  • Familiar with log shipping solutions (ELK, Loki, ...)
  • Familiar with CI/CD
  • Familiar with micro service architecture
  • Familiar with REST API