Pipelines28 Mar 2026· 6 min read

Reproducible HPC pipelines: why your bench needs Nextflow

Bash scripts and ad-hoc SLURM jobs do not survive the journey from PI's laptop to peer review. Here is the case for treating pipelines as first-class deliverables.

Michal Kováč
Michal Kováč
Cancer genomics

If you cannot re-run an analysis on a clean machine in one command, you do not have a pipeline — you have a story about a pipeline.

What "reproducible" actually means

A reproducible pipeline has four properties:

  • Versioned code. The exact pipeline definition is in git, tagged for the run.
  • Containerised tools. Every binary lives in a container with a fixed digest.
  • Declared inputs. Sample sheets describe the data, not file paths on someone's laptop.
  • Declarative resources. CPU, memory, and time per process are explicit, so the same workflow runs on a laptop, a cluster, or a cloud.

Why Nextflow

Nextflow is not the only option, but it hits a sweet spot for biomedical work:

  • First-class support for SLURM, AWS Batch, GCP Batch, Kubernetes.
  • The nf-core community provides peer-reviewed pipelines for the most common assays.
  • DSL2 modules let you compose institutional-grade workflows without rewriting the basics.

What we ship

When we deliver a Nextflow pipeline as part of an engagement, you get:

  1. The workflow repository, with versioned tags.
  2. A test profile that runs end to end on a tiny dataset in under 10 minutes.
  3. Documented resource profiles for your HPC.
  4. A short handover session so your team can run and modify it without us.

That is what reproducibility looks like in practice — not a paragraph in the methods section.

#Nextflow#HPC#reproducibility#DevOps

Want this expertise on your project?

Arrange a FREE Scoping Session