Single-cell RNA-seq: from raw counts to defensible biology

Single-cell RNA-seq has become a default tool, but the gap between "we ran the pipeline" and "we have a defensible biological conclusion" is wider than most projects assume.

Where projects usually go wrong

Three failure modes account for most of the unusable single-cell datasets we are asked to rescue:

QC was tuned to keep cells, not to keep signal. Loose thresholds on mitochondrial content and gene counts produce dense UMAPs that are largely artefact.
Batch effects were "removed" without being measured. Aggressive integration can erase the very biology you came to find.
Annotation was done by eye. Marker-gene cherry-picking does not survive a careful reviewer.

A short checklist

Report per-sample QC distributions, not just final cell counts.
Quantify batch effect before and after integration (kBET, LISI, silhouette).
Annotate with at least one reference-based method (e.g. SingleR, CellTypist) alongside marker genes.
Keep raw counts and the exact pipeline version reproducible end to end.

Why this matters for your timeline

Most "we need to redo this analysis" requests we get could have been avoided with a one-week scoping pilot. That is the part of the work that is cheap to fix early and very expensive to fix at submission.

Single-cell RNA-seq: from raw counts to defensible biology

Where projects usually go wrong

A short checklist

Why this matters for your timeline

Keep reading

Reproducible HPC pipelines: why your bench needs Nextflow

Want this expertise on your project?