Senior SRE Engineer

£80,000
Remote
Permanent

We’re looking for a Senior Engineer who brings a strong balance of production reliability and engineering-driven automation — someone who can uphold the stability of our systems while proactively reducing manual toil through tooling and process improvements. A commitment to on-call ownership is fundamental, alongside a passion for building resilient, observable, and self-healing infrastructure.

Key Responsibilities:

  • Design, implement, and maintain scalable infrastructure for a high-performance, low-latency crypto trading platform.
  • Operate and enhance Kubernetes and Nomad-based environments to ensure system stability, scalability, and security.
  • Build infrastructure automation and deployment pipelines using Terraform, Ansible, ArgoCD, and GitHub Actions.
  • Collaborate closely with engineering teams to streamline service onboarding, automate routine tasks, and improve deployment velocity.
  • Improve observability and reliability through better logging, metrics, tracing, and alerting with the Grafana ecosystem.
  • Conduct root cause analysis and postmortems for production incidents; continuously improve system resilience and incident response.
  • Partner with SRE security stream and regulatory and compliance teams to ensure infrastructure meets regulatory and organizational requirements.
  • Support multi-environment deployments (dev, staging, testnet, mainnet) with a focus on safe rollouts, rollbacks, and configuration management.
  • Contribute to capacity planning, cost optimization, and infrastructure scaling strategies as the platform grows.

Experience & Skills Requirements:

  • Proficient in participating in an on-call rotation. Candidates must demonstrate ownership in incident response and bring a mindset focused on long-term system stability and continuous improvement.
  • Strong experience operating and maintaining low-latency, distributed systems in production environments.
  • Proficient in cloud-native platforms and container orchestration using AWS, GCP, Kubernetes, and Nomad.
  • Solid understanding of Linux/Unix internals and the TCP/IP networking stack.
  • Comfortable with one or more of: Bash, Golang, or Python.
  • Skilled in root cause analysis, performance tuning, and system-level debugging across complex service architectures.
  • Experience building and managing end-to-end infrastructure, including infrastructure as code, CI/CD pipelines, and monitoring systems.
  • Familiar with modern GitOps workflows and tools such as GitHub Actions, ArgoCD, Argo Workflows, and Argo Events.
  • Comfortable owning production systems end-to-end — from infrastructure as code to automated monitoring and deployment workflows.
  • Pragmatic in approach — depth and ownership matter more than broad familiarity. We value engineers who bring expertise, curiosity, and a bias for action.
  • Bonus: Experience with Aeron messaging system is a strong plus.

Technology Stack:

  • Cloud Providers: AWS, GCP
  • Orchestration: Kubernetes, Nomad
  • Infrastructure as Code: Terraform, Ansible
  • CI/CD: GitHub Actions, Argo Ecosystem
  • Observability: Loki, Grafana, Tempo, Mimir (LGTM stack) or corresponding open-source alternatives
  • Low latency Messaging & Transport (big plus but not required): Adaptive Aeron
  • Architecture: Microservices
  • Languages: Golang (preferred, but not required)

Application Form
Accepted file types: doc, docx, pdf, Max. file size: 8 MB.
Send Message
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.