Reliability Engineer at market-leading quantitative trading firm in London. The team exists in the space between traditional systems administration and development, and seeks to merge the capabilities from both disciplines. Supporting large, business-critical distributed systems and monitoring software reliability in a high performance environment.
- Acting as a conduit between both infrastructure and development teams
- Primary support for multiple, large distributed software applications
- Improving all aspects of software reliability, including better monitoring, alerting and documentation
- Engaging with software engineering teams on support issues and improvements to tools, processes, and software
- Gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding
Technical Skills Required:
- At least one of the following: 1. host-based networking, 2. Linux/Unix administration, 3. systems programming, 4. distributed systems, 5. databases, and a desire to learn more.
- The ability to quickly leverage off-the-shelf and open-source systems and utilities to rapidly provision production systems in a variety of domains, especially for multi-tenant use.
- A proven track record of automation and an algorithmic approach to solving problems.
- A proactive approach to spotting problems, areas for improvement, performance bottlenecks, etc.
- An understanding of the operational concerns in a demanding environment; ideally, but not necessarily, finance.