Back to Search Start Over

Significance Analysis for Clustering with Single-Cell RNA-Sequencing Data

Authors :
Kelly Street
Rafael Irizarry
Isabella Grabski
Publication Year :
2022
Publisher :
Cold Spring Harbor Laboratory, 2022.

Abstract

Unsupervised clustering of single-cell RNA-sequencing data enables the identification and discovery of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. Many popular pipelines use clustering stability methods to assess the algorithms’ output and decide on the number of clusters. However, we find that by not addressing known sources of variability in a statistically rigorous manner, these analyses lead to overconfidence in the discovery of novel cell-types. We extend a previous method for Gaussian data, Significance of Hierarchical Clustering (SHC), to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. We benchmarked our approach on real-world datasets against popular clustering workflows, demonstrating improved performance. To show its practical utility, we applied it to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex. We identified several cases of over-clustering, leading to false discoveries, as well as under-clustering, resulting in the failure to identify new subpopulations that our method was able to detect.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........79f82e1e0f7f6e4754d8eb0c4075d782
Full Text :
https://doi.org/10.1101/2022.08.01.502383