img
Permanent

GPU Cluster Architect

Amsterdam
money-bag £90000 - £115000/annum
Posted Yesterday

GPU Architect

Remote ( Amsterdam, Netherlands)

Why This Role

Join a fast-growing team that is redefining cloud infrastructure for the AI era. The focus is on building platforms that enable customers to tackle complex real-world problems and scale innovation, without the burden of massive infrastructure investments or large internal AI/ML teams. You''ll be working at the forefront of GPU and AI infrastructure alongside highly skilled engineers and industry leaders.

About the Position

We are looking for a GPU Cluster Architect to lead the design of large-scale, next-generation GPU clusters that underpin advanced AI workloads. This is a hands-on, senior role with responsibility for shaping architecture across compute, networking, and storage, ensuring systems deliver the scale, reliability, and performance demanded by today''s AI and ML applications.

You''ll be responsible for defining how very large GPU deployments are networked, powered, cooled, and optimised across multiple data centre environments.

Core Responsibilities:

  • Cluster Architecture: Design and define scalable topologies, spanning compute, interconnects (InfiniBand, Ethernet), storage, and orchestration layers.
  • Workload Analysis: Model and assess AI/ML workloads (such as LLM training and inference) to guide design choices on latency, bandwidth, and GPU density.
  • Networking: Collaborate with network specialists to implement and validate ultra-low latency, high-throughput solutions (InfiniBand HDR/NDR, RoCEv2) at rack, POD, and DC scale.
  • Data & Storage: Partner with storage teams to optimise training data access, checkpointing, and high-performance throughput.
  • Reliability & Observability: Translate signals from monitoring and telemetry systems into design improvements and reliability gains.
  • Cross-Functional Collaboration: Work closely with reliability, networking, storage, and data centre engineering teams to deliver designs that scale seamlessly.

What You''ll Bring:

  • 5+ years of experience architecting or designing large-scale compute clusters
  • Strong knowledge of modern GPU platforms (e.g. NVIDIA, AMD)
  • Hands-on expertise with HPC interconnect technologies (InfiniBand, RoCE)
  • Background in systems architecture, hardware reliability, and networking fundamentals
  • Experience building automation and telemetry pipelines with scripting languages (Python, Go, etc.)

What''s on Offer:

  • A competitive salary and full benefits package
  • Clear opportunities for professional development and growth
  • Flexibility with hybrid/remote work arrangements
  • A dynamic environment that rewards initiative, creativity, and innovation

If you''re excited by the idea of shaping the backbone of large-scale AI infrastructure, we''d love to hear from you.

Perform a fresh search...

  • Create your ideal job search criteria by
    completing our quick and simple form and
    receive daily job alerts tailored to you!

Jobs. Straight to your inbox!