Senior SRE Engineer

£80,000

Remote

Permanent

We’re looking for a Senior Engineer who brings a strong balance of production reliability and engineering-driven automation — someone who can uphold the stability of our systems while proactively reducing manual toil through tooling and process improvements. A commitment to on-call ownership is fundamental, alongside a passion for building resilient, observable, and self-healing infrastructure.

Key Responsibilities:

Design, implement, and maintain scalable infrastructure for a high-performance, low-latency crypto trading platform.
Operate and enhance Kubernetes and Nomad-based environments to ensure system stability, scalability, and security.
Build infrastructure automation and deployment pipelines using Terraform, Ansible, ArgoCD, and GitHub Actions.
Collaborate closely with engineering teams to streamline service onboarding, automate routine tasks, and improve deployment velocity.
Improve observability and reliability through better logging, metrics, tracing, and alerting with the Grafana ecosystem.
Conduct root cause analysis and postmortems for production incidents; continuously improve system resilience and incident response.
Partner with SRE security stream and regulatory and compliance teams to ensure infrastructure meets regulatory and organizational requirements.
Support multi-environment deployments (dev, staging, testnet, mainnet) with a focus on safe rollouts, rollbacks, and configuration management.
Contribute to capacity planning, cost optimization, and infrastructure scaling strategies as the platform grows.

Experience & Skills Requirements:

Proficient in participating in an on-call rotation. Candidates must demonstrate ownership in incident response and bring a mindset focused on long-term system stability and continuous improvement.
Strong experience operating and maintaining low-latency, distributed systems in production environments.
Proficient in cloud-native platforms and container orchestration using AWS, GCP, Kubernetes, and Nomad.
Solid understanding of Linux/Unix internals and the TCP/IP networking stack.
Comfortable with one or more of: Bash, Golang, or Python.
Skilled in root cause analysis, performance tuning, and system-level debugging across complex service architectures.
Experience building and managing end-to-end infrastructure, including infrastructure as code, CI/CD pipelines, and monitoring systems.
Familiar with modern GitOps workflows and tools such as GitHub Actions, ArgoCD, Argo Workflows, and Argo Events.
Comfortable owning production systems end-to-end — from infrastructure as code to automated monitoring and deployment workflows.
Pragmatic in approach — depth and ownership matter more than broad familiarity. We value engineers who bring expertise, curiosity, and a bias for action.
Bonus: Experience with Aeron messaging system is a strong plus.

Technology Stack:

Cloud Providers: AWS, GCP
Orchestration: Kubernetes, Nomad
Infrastructure as Code: Terraform, Ansible
CI/CD: GitHub Actions, Argo Ecosystem
Observability: Loki, Grafana, Tempo, Mimir (LGTM stack) or corresponding open-source alternatives
Low latency Messaging & Transport (big plus but not required): Adaptive Aeron
Architecture: Microservices
Languages: Golang (preferred, but not required)

‍

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.