If you cannot re-run an analysis on a clean machine in one command, you do not have a pipeline — you have a story about a pipeline.
What "reproducible" actually means
A reproducible pipeline has four properties:
- Versioned code. The exact pipeline definition is in git, tagged for the run.
- Containerised tools. Every binary lives in a container with a fixed digest.
- Declared inputs. Sample sheets describe the data, not file paths on someone's laptop.
- Declarative resources. CPU, memory, and time per process are explicit, so the same workflow runs on a laptop, a cluster, or a cloud.
Why Nextflow
Nextflow is not the only option, but it hits a sweet spot for biomedical work:
- First-class support for SLURM, AWS Batch, GCP Batch, Kubernetes.
- The nf-core community provides peer-reviewed pipelines for the most common assays.
- DSL2 modules let you compose institutional-grade workflows without rewriting the basics.
What we ship
When we deliver a Nextflow pipeline as part of an engagement, you get:
- The workflow repository, with versioned tags.
- A test profile that runs end to end on a tiny dataset in under 10 minutes.
- Documented resource profiles for your HPC.
- A short handover session so your team can run and modify it without us.
That is what reproducibility looks like in practice — not a paragraph in the methods section.

