Senior Site Reliability Engineer

OverviewBoard Intelligence
is a technology and advisory firm that supercharges boards with the science of board effectiveness. We build better businesses and benefit society.
Through a suite of AI-powered software tools, evaluation frameworks, and advisory services that distil twenty years of boardroom experience, we improve the efficiency of board processes and the effectiveness of boards. We work with over 70,000 leaders and 3,000 organisations across the world. In 2024 we received backing from K1 Investment Management. We are at the beginning of significant growth, and we’re looking for superb talent to join us on this journey.
The OpportunityAs a
Senior Site Reliability Engineer
(SRE), you\''ll join a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring they meet the needs of our internal and external users. You will take the lead on projects across the entire breadth of our tech stack, from planning to delivery and maintenance, and mentor others.
You will be responsible for visibility and monitoring, for building tooling and automation to reduce TOIL, and for responding to incidents as part of our 24/7 on-call team.
Responsibilities
Hands on work with technical projects, taking direction from team Principals
Implement and maintain monitoring solutions, metric-driven alerting, logging and tracing
Troubleshoot in complex environments
Establish and measure SLIs and SLOs with engineering teams and continuously improve collaboration
Participate in periodic 24x7 paid on-call duties
Holds, or is eligible to obtain UK Security Clearance (SC)
Build and manage systems using infrastructure as code and automation (Terraform, Ansible, Kubernetes, Helm, Go)
Pair programming, knowledge sharing and training
Write well-defined tickets and documentation
Requirements
UK Security Clearance (SC)
Strong background in SRE/DevOps or Linux system administration
Experience with automation using configuration management (Ansible, Chef, Puppet)
Understanding of containerisation and orchestration (Kubernetes)
Experience with automation via APIs
Experience with automation testing in Agile environments
Familiarity with network management, Postgres, and security frameworks CIS/NIST/OWASP
Public cloud experience (AWS, GCP, Azure)
Willingness to learn and work with hybrid on-prem and cloud
CI/CD understanding
Software engineering trade-offs for scalable applications
Technical writing or reviewing technical designs
Agile methodology experience (Scrum, Kanban)
Proficiency in Ruby, Java, Go, or Bash/Shell
Experience with Jira and issue tracking
Benefits
Competitive salary and pension
Personal performance bonus
26 days holiday
Bupa health and dental cover
Group life insurance
EAP and health services
Training and development programs
Cycle to work scheme
Parental policies
Gym membership discounts
Monthly company ..... full job details .....