Why this role
Looking to work at the intersection of DevOps, backend engineering, and real-time problem-solving? Here's your chance to make a real impact in a high-scale cloud environment, keeping production systems fast, reliable, and resilient for thousands of users.
You'll join a collaborative, tech-savvy team dedicated to making things just work better. From improving observability across microservices to responding to high-priority incidents, this is your platform to shape how scalable applications are delivered and supported.
What you'll be doing
• Fix and improve: Hunt down bugs in live microservices and make production more stable every day.
• Pair up with engineers: Collaborate with dev teams to sharpen code quality, boost resilience, and embed observability from the start.
• Own the cloud: Configure and manage cloud infrastructure (AWS), keeping everything humming at scale.
• Watch the signals: Build better monitoring and alerting systems to catch issues before they escalate.
• Troubleshoot deeply: Solve complex technical puzzles and help guide others through them.
• Automate everything: Write and maintain SOPs and automation scripts to reduce manual toil.
• Be the calm in the storm: Participate in the on-call rota and take ownership of live issues when they arise.
What we're looking for
• Solid experience debugging live applications and resolving production issues fast.
• Background in building and supporting microservice-based applications.
• Confidence working with MongoDB, AWS services, and containerisation tools like Docker or ECS.
• Familiarity with infrastructure-as-code and CI/CD pipelines (CloudFormation, CodeBuild, etc.).
• Comfort using monitoring and observability tools like Prometheus, New Relic, Grafana, or DataDog.
• Good grasp of scripting (Python or JavaScript) for automation and tooling.
• Clear thinking in the face of incidents, plus the drive to learn from them.
Bonus points for
• Knowledge of REST, GraphQL, and async messaging systems.
• Experience with Git workflows and CI/CD pipelines.
• Understanding of SRE principles (SLIs, SLOs, error budgets, etc.).
• Awareness of security and compliance (GDPR, privacy, risk management).
• Clear communicator with a team-first attitude.
Why you'll love it here
• You'll work with brilliant engineers who care about quality, automation, and clean code.
• You'll have the freedom to shape infrastructure as we scale and evolve.
• You'll gain deep exposure to modern DevOps tooling, incident response strategy, and production engineering.
• Your voice will matter—from tech choices to process improvements.