Salary: Up to ~£200k TC
Exciting opportunity for a hands-on software and reliability engineer in the London office of this global information business.
The core infrastructure system processes tens of billions of data points from thousands of sources every day, enriches the data and makes it accessible to downstream apps and users at the stroke of a key. Within this area, the SRE team is responsible for ensuring the optimal availability, latency, scalability and efficiency of the infrastructure for more than ten thousand client-facing applications.
Your workload will be varied and include building services and UIs to manage the application configuration for thousands of machines, understanding the current system’s capacity and making appropriate scaling recommendations. You will also develop and maintain tools for automation, helping to create dashboards, monitoring and alerts to track the health of the live system.
- Strong hands-on software development experience in C/C++, Python or any other programming language
- Outstanding communication skills
- Proven success and understanding of large-scale distributed systems, including troubleshooting and solving live production problems
- Monitoring software, e.g. Splunk, Humio or Grafana
- Practical knowledge of networking stacks such as TCP/UDP/IP
- Good understanding of Linux and/or CI/CD, e.g. Jenkins
- Truly collaborative culture across all engineering teams
- Strong base salaries + bonus
- Close-knit team where you can have a huge impact
- Flexible work environment