Site Reliability Engineering Lead
We are seeking an experienced Site Reliability Engineering leader to join a high-growth SaaS organisation in a hybrid role that combines technical leadership with hands-on engineering. This is a key position for someone passionate about reliability, resilience, and running production systems at scale. The successful candidate will lead and mentor the SRE team, set the technical direction for reliability engineering, and take end-to-end ownership of production systems. They will be accountable for availability, performance, and incident response, while working closely with Product and Engineering to define SLIs and establish meaningful SLOs that balance stability with delivery pace. They will champion a blameless culture, embedding robust incident management processes and driving continuous, systemic improvement.Key skills and experience:Proven experience as a Lead or Senior SRE with a strong software engineering backgroundStrong programming ability in PHP and Java or .NETExperience defining SLIs, setting SLOs, and using error budgets to guide decision-makingDemonstrated ownership of production systems with full accountability for uptime and resilienceHands-on experience building and running incident management processes, including blameless postmortemsStrong knowledge of observability and monitoring tools (eg Prometheus, Grafana, Datadog)Solid Linux systems expertise and experience with MySQL and PostgreSQLExperience with cloud platforms (Azure preferred), Kubernetes, and Inf
Other jobs of interest...
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!