Single-cell RNA-seq has become a default tool, but the gap between "we ran the pipeline" and "we have a defensible biological conclusion" is wider than most projects assume.
Where projects usually go wrong
Three failure modes account for most of the unusable single-cell datasets we are asked to rescue:
- QC was tuned to keep cells, not to keep signal. Loose thresholds on mitochondrial content and gene counts produce dense UMAPs that are largely artefact.
- Batch effects were "removed" without being measured. Aggressive integration can erase the very biology you came to find.
- Annotation was done by eye. Marker-gene cherry-picking does not survive a careful reviewer.
A short checklist
- Report per-sample QC distributions, not just final cell counts.
- Quantify batch effect before and after integration (kBET, LISI, silhouette).
- Annotate with at least one reference-based method (e.g. SingleR, CellTypist) alongside marker genes.
- Keep raw counts and the exact pipeline version reproducible end to end.
Why this matters for your timeline
Most "we need to redo this analysis" requests we get could have been avoided with a one-week scoping pilot. That is the part of the work that is cheap to fix early and very expensive to fix at submission.

