Negotiable
Undetermined
Undetermined
London Area, United Kingdom
Summary: The Site Reliability Engineer (SRE) role involves joining a dynamic infrastructure team to support enterprise-scale platforms by focusing on building scalable and automated infrastructure systems. The ideal candidate will possess strong automation skills and experience in production environments, particularly with container orchestration and CI/CD pipelines. This position requires collaboration with development teams to enhance deployment processes and improve system reliability. The role also includes responsibilities for system monitoring and incident response.
Key Responsibilities:
- Design, implement, and maintain scalable, highly available production systems
- Automate operational tasks using Shell scripting (Bash/Zsh)
- Contribute to and support Python-based application components
- Manage and optimise Kubernetes clusters and containerised deployments
- Build and maintain CI/CD pipelines using Spinnaker and GitHub Actions
- Implement Infrastructure as Code (IaC) using Pulumi
- Perform system monitoring, troubleshooting, and root cause analysis
- Participate in on-call rotation and incident response
- Improve system reliability, performance, and observability
- Collaborate with development teams to enhance deployment and release processes
Key Skills:
- Strong experience with Shell scripting (Bash/Zsh)
- Solid Python programming experience
- Automation mindset with experience eliminating manual processes
- Strong hands-on experience with Kubernetes (K8s)
- Docker containerisation expertise
- Experience managing production-grade clusters
- Experience with Spinnaker
- Hands-on experience with GitHub Actions
- Strong understanding of modern DevOps practices
- Infrastructure as Code using Pulumi
- Strong understanding of cloud-native architecture principles
- Experience managing scalable distributed systems
- Git GitHub workflows and branching strategies
- Experience working in large-scale enterprise or high-availability environments
- Strong troubleshooting and production support experience
- Familiarity with monitoring and observability tooling
- Experience in high-traffic, performance-sensitive systems
Salary (Rate): undetermined
City: London Area
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Role Overview We are seeking highly skilled Site Reliability Engineers (SREs) to join a fast-paced infrastructure team supporting enterprise-scale platforms. This role sits at the intersection of Development and Operations, focusing on building scalable, resilient, and automated infrastructure systems. The ideal candidate will be automation-first, comfortable working in production environments, and experienced in container orchestration, CI/CD pipelines, and Infrastructure as Code.
Key Responsibilities
- Design, implement, and maintain scalable, highly available production systems
- Automate operational tasks using Shell scripting (Bash/Zsh)
- Contribute to and support Python-based application components
- Manage and optimise Kubernetes clusters and containerised deployments
- Build and maintain CI/CD pipelines using Spinnaker and GitHub Actions
- Implement Infrastructure as Code (IaC) using Pulumi
- Perform system monitoring, troubleshooting, and root cause analysis
- Participate in on-call rotation and incident response
- Improve system reliability, performance, and observability
- Collaborate with development teams to enhance deployment and release processes
Required Skills & Experience
- Programming & Scripting
- Strong experience with Shell scripting (Bash/Zsh)
- Solid Python programming experience
- Automation mindset with experience eliminating manual processes
- Containerisation & Orchestration
- Strong hands-on experience with Kubernetes (K8s)
- Docker containerisation expertise
- Experience managing production-grade clusters
- CI/CD & Deployment
- Experience with Spinnaker
- Hands-on experience with GitHub Actions
- Strong understanding of modern DevOps practices
- Infrastructure & Cloud
- Infrastructure as Code using Pulumi
- Strong understanding of cloud-native architecture principles
- Experience managing scalable distributed systems
- Version Control
- Git
- GitHub workflows and branching strategies
- Preferred Experience
- Experience working in large-scale enterprise or high-availability environments
- Strong troubleshooting and production support experience
- Familiarity with monitoring and observability tooling
- Experience in high-traffic, performance-sensitive systems