Site Reliability Engineer
Job Title: Site Reliability EngineerJob DescriptionThis Site Reliability Engineer role focuses on designing, building and maintaining cloud-based, high-volume, high-speed systems that provide critical data services to the insurance industry. You will work primarily in AWS, using Linux, containers and modern automation and CI/CD tooling to improve reliability, performance and security. The position combines hands-on engineering, incident response and continuous improvement of the platform and its supporting infrastructure.ResponsibilitiesDesign, implement and support scalable, resilient cloud-based solutions in AWS for high-volume, high-speed data systems.Apply structured problem-solving skills to investigate and resolve technical issues across production and non-production environments.Own and deliver regular maintenance activities such as system patching, upgrades and general platform housekeeping.Diagnose and address system performance issues, identifying bottlenecks and implementing improvements.Develop and maintain automation using scripting languages such as Python and tools like Ansible and Terraform to manage infrastructure and deployments.Build, support and test infrastructure components as part of a collaborative engineering team.Contribute to the design and implementation of observability and resilience practices to improve system reliability.Participate in incident response, troubleshooting and root cause analysis to enhance system stability and prevent recurrence. ..... full job details .....
Other jobs of interest...
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!