Back to Search
Start Over
Significance Analysis for Clustering with Single-Cell RNA-Sequencing Data
- Publication Year :
- 2022
- Publisher :
- Cold Spring Harbor Laboratory, 2022.
-
Abstract
- Unsupervised clustering of single-cell RNA-sequencing data enables the identification and discovery of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. Many popular pipelines use clustering stability methods to assess the algorithms’ output and decide on the number of clusters. However, we find that by not addressing known sources of variability in a statistically rigorous manner, these analyses lead to overconfidence in the discovery of novel cell-types. We extend a previous method for Gaussian data, Significance of Hierarchical Clustering (SHC), to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. We benchmarked our approach on real-world datasets against popular clustering workflows, demonstrating improved performance. To show its practical utility, we applied it to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex. We identified several cases of over-clustering, leading to false discoveries, as well as under-clustering, resulting in the failure to identify new subpopulations that our method was able to detect.
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi...........79f82e1e0f7f6e4754d8eb0c4075d782
- Full Text :
- https://doi.org/10.1101/2022.08.01.502383