Back to Dashboard

Sr. Site Reliability Engineer (SRE)

SiFi

Saudi Arabia | First seen: 06 Feb 2026 21:05 | Foundit

Full-time
Apply Links

Click to view the job description. Download the JD manually if needed.

Foundit LinkedIn Jobrapido.com
Job Description
This is a remote position. About SiFi: SiFi is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. As a licensed EMI from the Saudi Central Bank, we empower companies with innovative tools to simplify finance management. Position Overview We are looking for a Senior Site Reliability Engineer (SRE) who will take ownership of the reliability, performance, and scalability of our production systems. You will design, automate, and operate mission-critical environments that include Kubernetes clusters, database disaster recovery, workflow orchestration, and multi-region networking. This role suits engineers who think deeply about systems combining infrastructure, automation, and diagnostic reasoning to drive operational excellence. Primary Responsibilities Reliability, Availability & Infrastructure • Maintain and evolve multi-region cloud infrastructure using Terraform-based Infrastructure as Code (IaC). • Operate and optimize Kubernetes (OKE) clusters running microservices, data pipelines, and workflow orchestration. • Manage SQL Server backup/restore pipelines, DR testing, and performance optimization. • Ensure high availability for .NET and Python applications hosted behind load balancers and WAF. • Design and maintain cross-network connectivity (DRGs, LPGs, VCNs, subnets, and NSGs). Observability & Automation • Build and maintain a centralized orchestration platform integrated with alerting and notification systems. • Develop self-healing, monitoring, and auto-remediation scripts for infrastructure and databases. • Implement logging, metrics, and tracing pipelines • Automate recurring operational tasks using Python, Bash, and PowerShell to reduce manual effort and improve reliability. DevOps, CI/CD & Security • Manage GitHub Actions and Octopus Deploy pipelines for backend and data services. • Apply strong security principles least privilege, network segmentation, secure credentials, and encrypted communications. • Promote GitOps and Infrastructure-as-Code practices to ensure repeatable and traceable deployments. • Collaborate with developers to embed reliability and resilience into every release Collaboration & Incident Management • Lead incident response, run blameless post-mortems , and turn findings into lasting improvements. • Partner closely with engineering teams to drive design and code-level reliability improvements. • Conduct capacity planning, cost optimization, and system tuning for performance and scalability. • Mentor engineers in automation, observability, and root-cause analysis best practices Troubleshooting Mindset & Diagnostic Thinking We Value Engineers Who • Approach issues systematically and validate assumptions with data. • Treat incidents as opportunities to improve design and automation. • Rely on metrics, logs, and tracing rather than guesswork. • Communicate findings clearly and document learnings for future reference. • Continuously refine how problems are detected, escalated, and resolved.
Notes
Notification History
failed 06 Feb 21:05
HTTP 429 error: Unable to create record: Account AC220246b462643a27d892cf705b810f79 exceeded the 50 daily messages limit
Metadata

Source: google_jobs

Via: Foundit

Search Query: Senior SRE

First Seen: 06 Feb 2026 21:05 UTC

Last Seen: 07 Feb 2026 00:30 UTC

Source Job ID: eyJqb2JfdGl0bGUiOiJTci4gU2l0ZSBSZWxpYWJpbGl0eSBFbmdpbmVlciAoU1JFKSIsImNvbXBhbnlfbmFtZSI6IlNpRmkiLCJhZGRyZXNzX2NpdHkiOiJTYXVkaSBBcmFiaWEiLCJodGlkb2NpZCI6InV0UDJyeHgtcjI3RFprbndBQUFBQUE9PSIsInV1bGUiOiJ3K0NBSVFJQ0lNVTJGMVpHa2dRWEpoWW1saCIsImhsIjoiZW4ifQ==