Back to Search
Start Over
Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 2; peer review: 1 approved, 2 approved with reservations]
- Source :
- F1000Research. 8:ISCB Comm J-296
- Publication Year :
- 2019
- Publisher :
- London, UK: F1000 Research Limited, 2019.
-
Abstract
- Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling .
Details
- ISSN :
- 20461402
- Volume :
- 8
- Database :
- F1000Research
- Journal :
- F1000Research
- Notes :
- Revised Amendments from Version 1 - We incorporated a new method (MetaNeighbor) into our evaluation. - We incorporated two new scRNA-seq datasets (Tabula Muris and PBMCs measured using Seq-Well). - All Figures have changed: a) we clarified the approach we used to transform each method’s predictions into ranks for the ROC and PR curve analyses. This includes main text, updated Figure 1G, and response to Reviewers. b) In our previous version we analyzed four methods and three datasets. In our new version we evaluated five methods and eight dataset variants, and we modified the presentation of the results. Now each Figure 2 to 5 shows all ROC and PR results for each dataset; instead of our previous version where each figure shown ROC results for all datasets in one figure and PR results for all datasets in another figure. - We added a Figure 6, which has a summary of results and new results on the influence of the number of genes in cell type signatures on the performance of methods. - We added Supplementary Table 1 with the actual values of Figure 6A-D and Supplementary Table 2 with a comparison of an alternative signature dataset for the PBMC datasets - We modified our software code to take prediction outputs in a simpler format than our previous version. The GitHub and Zenodo links were updated accordingly. - The main text has been clarified in several places., , [version 2; peer review: 1 approved, 2 approved with reservations]
- Publication Type :
- Academic Journal
- Accession number :
- edsfor.10.12688.f1000research.18490.2
- Document Type :
- research-article
- Full Text :
- https://doi.org/10.12688/f1000research.18490.2