Lead Engineer - Software & HPC Engineering

Job Description

Lead / Senior HPC Engineer
Location: On-site (due to secure, air-gapped systems)
Full-time 40 hours per week

Are you ready to play a key role in one of the most ambitious technological challenges of our time?

We are a pioneering UK-based deep-tech company developing next-generation solutions at the cutting edge of advanced physics, simulation, and machine learning. Our work is focused on unlocking scalable, clean energy through breakthrough approaches, supported by world-class computational capabilities and innovative engineering.

Alongside our core mission, we collaborate with leading organisations across advanced industries, applying our proprietary simulation tools and technologies to solve complex, high-impact challenges.

This is a rare opportunity to join a highly skilled, mission-driven team working at the forefront of science and engineering innovation.



The Role

We're seeking a Lead HPC Engineer - or an experienced Senior HPC Engineer ready to step up - to take ownership of a large-scale, high-performance computing environment.

You'll support and evolve an HPC cluster of over 10,000 cores, ensuring reliability, performance, and scalability for workloads ranging from single high-precision runs to thousands of parallel simulations.

Working within the Software & HPC Engineering team, you'll collaborate closely with computational scientists, data engineers, and IT specialists to deliver a robust platform that underpins cutting-edge research and development.



Key Responsibilities

  • Maintain and optimise HPC hardware, working with external vendors where required
  • Manage core system software and ensure platform stability
  • Monitor performance, troubleshoot issues, and drive continuous improvements
  • Oversee backups of critical data and system configurations
  • Schedule and perform maintenance aligned with user activity
  • Profile workloads and enhance system efficiency
  • Communicate system status, updates, and major issues to stakeholders
  • Capture user requirements and contribute to upgrade and capacity planning
  • Support procurement processes and vendor negotiations
  • Produce clear documentation for both technical teams and end users
  • Collaborate across engineering and IT teams on shared infrastructure


Current Environment

You'll be working with a modern HPC stack, including:

  • Large-scale multi-vendor server infrastructure (AMD EPYC, Intel Xeon)
  • High-speed networking (100Gb LAN) and high-performance storage systems
  • Linux-based environments (AlmaLinux, Ubuntu)
  • Distributed file systems (Lustre, GlusterFS, NFS)
  • HPC tooling including Slurm, Ansible, and monitoring frameworks
  • Development ecosystems supporting C++, Fortran, MPI, and Python


About You

Essential:

  • Degree in Computer Science (or equivalent experience)
  • Strong expertise in Linux, HPC systems, storage, and networking
  • Experience with MPI and scientific computing environments (C++, Fortran)
  • Familiarity with job schedulers and workload management systems
  • Scripting skills (Shell, Python) and version control (Git)
  • Ability to design, implement, and support complex HPC systems
  • Strong analytical thinking and problem-solving skills
  • Excellent communication and collaboration abilities

Desirable:

  • Deep expertise in HPC optimisation and performance profiling
  • Experience with configuration management tools (e.g. Ansible)
  • Knowledge of containerisation (e.g. Singularity, Apptainer)
  • Experience working with secure or air-gapped environments
  • Familiarity with HPC accounting systems and SQL databases
  • Experience supporting and training end users

Rullion celebrates and supports diversity and is committed to ensuring equal opportunities for both employees and applicants.

Job Ref:

3100135020