Site Reliability Engineer
Site Reliability Engineer (Cloud and Automation) - London - 2 Days on Site per week.A leading global financial services organisation is seeking a Site Reliability Engineer (SRE) to drive reliability, automation, and performance across its cloud-hosted platforms.The OpportunityThis role sits within a high-performing Platform Operations function, acting as a central point of expertise for SRE methodologies and automation. You will play a key role in improving system resilience, scalability, and operational excellence across a complex, regulated environment.Key ResponsibilitiesLead the implementation of SRE best practices across cloud infrastructureDrive improvements in observability, alerting, and capacity planning (SLA / SLO / SLI)Identify and reduce operational toil through automation and remediation frameworksBuild and enhance GitOps and Infrastructure-as-Code capabilities (e.g. Terraform, Ansible)Develop and review production-grade code to support automation initiativesSupport incident management and on-call processes, ensuring production stabilityContribute to post-incident reviews, embedding SRE principles to reduce riskRequirementsDemonstrable experience in SRE or infrastructure operations within cloud environments (AWS / GCP)Strong scripting skills (Python, Ansible, or PowerShell)Experience with Infrastructure as Code and GitOps methodologiesHands-on knowledge of observability / APM tools (e.g. Grafana, Datadog, Dynatrace)Proven experience managing incidents, root ..... full job details .....
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!