Site Reliability Engineer

Site Reliability Engineer

Posted 1 day ago by ALOIS UK

Negotiable
Undetermined
Undetermined
London Area, United Kingdom

Summary: The Site Reliability Engineer (SRE) role involves joining a dynamic infrastructure team to support enterprise-scale platforms by focusing on building scalable and automated infrastructure systems. The ideal candidate will possess strong automation skills and experience in production environments, particularly with container orchestration and CI/CD pipelines. This position requires collaboration with development teams to enhance deployment processes and improve system reliability. The role also includes responsibilities for system monitoring and incident response.

Key Responsibilities:

  • Design, implement, and maintain scalable, highly available production systems
  • Automate operational tasks using Shell scripting (Bash/Zsh)
  • Contribute to and support Python-based application components
  • Manage and optimise Kubernetes clusters and containerised deployments
  • Build and maintain CI/CD pipelines using Spinnaker and GitHub Actions
  • Implement Infrastructure as Code (IaC) using Pulumi
  • Perform system monitoring, troubleshooting, and root cause analysis
  • Participate in on-call rotation and incident response
  • Improve system reliability, performance, and observability
  • Collaborate with development teams to enhance deployment and release processes

Key Skills:

  • Strong experience with Shell scripting (Bash/Zsh)
  • Solid Python programming experience
  • Automation mindset with experience eliminating manual processes
  • Strong hands-on experience with Kubernetes (K8s)
  • Docker containerisation expertise
  • Experience managing production-grade clusters
  • Experience with Spinnaker
  • Hands-on experience with GitHub Actions
  • Strong understanding of modern DevOps practices
  • Infrastructure as Code using Pulumi
  • Strong understanding of cloud-native architecture principles
  • Experience managing scalable distributed systems
  • Git GitHub workflows and branching strategies
  • Experience working in large-scale enterprise or high-availability environments
  • Strong troubleshooting and production support experience
  • Familiarity with monitoring and observability tooling
  • Experience in high-traffic, performance-sensitive systems

Salary (Rate): undetermined

City: London Area

Country: United Kingdom

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Role Overview We are seeking highly skilled Site Reliability Engineers (SREs) to join a fast-paced infrastructure team supporting enterprise-scale platforms. This role sits at the intersection of Development and Operations, focusing on building scalable, resilient, and automated infrastructure systems. The ideal candidate will be automation-first, comfortable working in production environments, and experienced in container orchestration, CI/CD pipelines, and Infrastructure as Code.

Key Responsibilities

  • Design, implement, and maintain scalable, highly available production systems
  • Automate operational tasks using Shell scripting (Bash/Zsh)
  • Contribute to and support Python-based application components
  • Manage and optimise Kubernetes clusters and containerised deployments
  • Build and maintain CI/CD pipelines using Spinnaker and GitHub Actions
  • Implement Infrastructure as Code (IaC) using Pulumi
  • Perform system monitoring, troubleshooting, and root cause analysis
  • Participate in on-call rotation and incident response
  • Improve system reliability, performance, and observability
  • Collaborate with development teams to enhance deployment and release processes

Required Skills & Experience

  • Programming & Scripting
  • Strong experience with Shell scripting (Bash/Zsh)
  • Solid Python programming experience
  • Automation mindset with experience eliminating manual processes
  • Containerisation & Orchestration
  • Strong hands-on experience with Kubernetes (K8s)
  • Docker containerisation expertise
  • Experience managing production-grade clusters
  • CI/CD & Deployment
  • Experience with Spinnaker
  • Hands-on experience with GitHub Actions
  • Strong understanding of modern DevOps practices
  • Infrastructure & Cloud
  • Infrastructure as Code using Pulumi
  • Strong understanding of cloud-native architecture principles
  • Experience managing scalable distributed systems
  • Version Control
  • Git
  • GitHub workflows and branching strategies
  • Preferred Experience
  • Experience working in large-scale enterprise or high-availability environments
  • Strong troubleshooting and production support experience
  • Familiarity with monitoring and observability tooling
  • Experience in high-traffic, performance-sensitive systems