Senior Site Reliability Engineer

Location London
Discipline: Software Design and Application Development
Job type: Permanent
Salary: £85K - 91K

This global financial services firm contributes to the stability of the financial markets. They help clients cut through complexity and mitigate risks of financial transactions. They have the ambition to use this key role to facilitate and accelerate a sustainable global financial system.

Role Purpose

As a Senior Site Reliability Engineer you’ll be working alongside and closely with DevEx and Cloud Engineer. They’re a group of engineers who are  passionate in learning new technologies and fostering a collaborative and inclusive environment.

Primary Responsibilities

  • Gather and analyse metrics from servers and services to assist in performance tuning and fault finding.

  • Partner with development teams to improve services through rigorous testing and release procedures.

  • Participate in system design consulting, platform management, and capacity planning

  • Create sustainable systems and services through automation and uplifts

  • Balance feature development speed and reliability with well-defined service-level objectives

  • Proactively manage TLS/SSL Certificates for server technology

  • Management of client integration accreditation testing and sign-off

  • Failure Engineering experience (chaos, failure, resilience & recovery)

Qualifications

  • Knowledgeable and experienced with building, running and supporting Kubernetes clusters in a highly available, high traffic Production environment

  • Experience working in cloud-based infrastructure (AWS)

  • Familiarity with one or more coding languages, preferably Go, Python, Ruby, Node

  • Troubleshooting experience in complex environments using monitoring and logging tools (We use Grafana, Loki, Tempo, Prometheus & Graylog to name a few)

  • Knowledge and experience with Terraform