Back to Dashboard

Senior Site Reliability Engineer (SRE)

Salla

Anywhere | First seen: 06 Feb 2026 21:05 | Jobs By Workable

Posted: 18 days ago Full-time
Apply Links

Click to view the job description. Download the JD manually if needed.

Jobs By Workable Bayt.com Naukrigulf Foundit Jobg8 CVResumeNest
Job Description
As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the on-call rotation as part of our commitment to platform reliability. Reliability & Incident Management • Lead high-severity incident response and drive post-incident reviews. • Troubleshoot complex issues across applications, infrastructure, and networks. • Improve MTTR through better monitoring, alerts, and diagnostic tooling. • Participate in the on-call rotation supporting production systems. Performance & Scalability • Identify and resolve performance bottlenecks and scaling challenges. • Conduct load testing and capacity planning for high-traffic scenarios. Infrastructure & Operations • Enhance cloud-native infrastructure, deployment processes, and automation. • Improve resilience, fault-tolerance, and recovery mechanisms across systems. Observability • Build and refine dashboards, alerts, metrics, logs, and traces. • Define SLIs/SLOs and improve visibility into system behavior. Tooling & Automation • Develop tools that reduce operational toil and increase reliability. • Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows. Collaboration • Work closely with engineering teams to ensure services are robust and production-ready. • Mentor engineers on reliability, debugging, and operational best practices. Bonus Skills • Background in large-scale, high-traffic systems. • Experience with fault-tolerant design, DR, and HA patterns. • Familiarity with SLOs, SLIs, and error budgets. Location Preference • Candidates located within GMT 0 to +6 time zones are preferred to align with team collaboration and on-call coverage. • Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure). • Deep understanding of Linux, networking, distributed systems, and load balancing. • Hands-on experience with Terraform or similar Infrastructure-as-Code tools. • Experience with observability platforms such as Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent. • Proficiency in scripting or programming languages such as Bash, Python, or Go. • Experience with CI/CD pipelines and GitOps practices. • Strong debugging, incident response, and performance analysis skills.
Notes
Notification History
failed 06 Feb 21:05
HTTP 429 error: Unable to create record: Account AC220246b462643a27d892cf705b810f79 exceeded the 50 daily messages limit
Metadata

Source: google_jobs

Via: Jobs By Workable

Search Query: Senior SRE

First Seen: 06 Feb 2026 21:05 UTC

Last Seen: 06 Feb 2026 23:00 UTC

Source Job ID: eyJqb2JfdGl0bGUiOiJTZW5pb3IgU2l0ZSBSZWxpYWJpbGl0eSBFbmdpbmVlciAoU1JFKSIsImNvbXBhbnlfbmFtZSI6IlNhbGxhIiwiYWRkcmVzc19jaXR5IjoiTWFra2FoIFNhdWRpIEFyYWJpYSIsImh0aWRvY2lkIjoiXzdmTm1qUTNMcEdieVBnWEFBQUFBQT09IiwidXVsZSI6IncrQ0FJUUlDSU1VMkYxWkdrZ1FYSmhZbWxoIiwiaGwiOiJlbiJ9