Overview
Our client in the Life Science industry is a startup in stealth mode backed by strong funding. They are seeking a
Principal Data Engineer
to lead the data and infrastructure systems powering the foundation model transforming drug development.Responsibilities
Lead data and infrastructure systems powering foundation model initiatives in drug development.Own data workflows end-to-end, from extraction and transformation to clean Parquet outputs for machine learning teams.Collaborate closely with wet lab teams; practically understand assays and protocol development.Set up cloud data infrastructure from scratch, including compute, storage, networking, and access controls.Build reliable, repeatable pipelines with testing, version control, and clear documentation.Maintain data quality, lineage, and monitoring; implement sound data modeling practices.Qualifications (Requirements)
Principal-level data engineering experience in life sciences is essential.End-to-end ownership of data workflows from extraction to machine learning-ready outputs (Parquet).Hands-on familiarity with genomics data, including raw FASTQ files and Illumina sequencer outputs.Experience with metabolomics data, particularly untargeted mass spectrometry.Strong collaboration with wet lab teams and practical understanding of assays and protocol development.Cloud data infrastructure built from scratch (compute, storage, networking, access controls).Strong Python and SQL skills; proficient in data modeling, data quality, lineage, and monitoring.Ability to design and maintain reliable pipelines with testing and documentation.Preferences
Experience building data lakes or lakehouses and automating batch workflows (e.g., Airflow).Familiarity with NGS pipelines (quality control, alignment/assembly, variant calling) and mass spectrometry data analysis.Use of Infrastructure as Code (Terraform), containerization (Docker), and CI/CD for deploying data systems.Prior 0-to-1 startup experience and close collaboration with ML and biology teams.Why Join
Design and build cloud infrastructure and data pipelines powering distributed ML training and scalable biological data workflows—without legacy constraints.Work with first-of-their-kind, multi-modal datasets to support foundation model training at AlphaFold scale; this is a builder role with deep technical ownership.Join as a founding member of the engineering team with significant equity and end-to-end system ownership.See your work directly enable drug discoveries that will impact millions, collaborating with world-leading scientists in microbiome research and machine learning.Location:
London - 3 days onsiteSalary:
£ 80 000 - £ 120 000 plus ..... full job details .....