Back to Search Start Over

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 2; peer review: 1 approved, 2 approved with reservations]

Authors :
J. Javier Diaz-Mejia
Elaine C. Meng
Alexander R. Pico
Sonya A. MacParland
Troy Ketela
Trevor J. Pugh
Gary D. Bader
John H. Morris
Author Affiliations :
<relatesTo>1</relatesTo>Princess Margaret Cancer Centre, University Health Network, Toronto, ON, M5G 2M9, Canada<br /><relatesTo>2</relatesTo>The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada<br /><relatesTo>3</relatesTo>Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, 94143, USA<br /><relatesTo>4</relatesTo>Gladstone Institutes, San Francisco, CA, 94158, USA<br /><relatesTo>5</relatesTo>Multi-Organ Transplant Program, Toronto General Hospital Research Institute, Toronto, ON, M5G 2C4, Canada<br /><relatesTo>6</relatesTo>Department of Immunology, University of Toronto, Toronto, ON, M5S 1A8, Canada<br /><relatesTo>7</relatesTo>Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, M5G 1L7, Canada<br /><relatesTo>8</relatesTo>Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada<br /><relatesTo>9</relatesTo>Ontario Institute for Cancer Research, Toronto, ON, M5G 0A3, Canada<br /><relatesTo>10</relatesTo>Department of Molecular Genetics, University of Toronto, Toronto, ON, M5G 1A8, Canada
Source :
F1000Research. 8:ISCB Comm J-296
Publication Year :
2019
Publisher :
London, UK: F1000 Research Limited, 2019.

Abstract

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling .

Details

ISSN :
20461402
Volume :
8
Database :
F1000Research
Journal :
F1000Research
Notes :
Revised Amendments from Version 1 - We incorporated a new method (MetaNeighbor) into our evaluation. - We incorporated two new scRNA-seq datasets (Tabula Muris and PBMCs measured using Seq-Well). - All Figures have changed: a) we clarified the approach we used to transform each method’s predictions into ranks for the ROC and PR curve analyses. This includes main text, updated Figure 1G, and response to Reviewers. b) In our previous version we analyzed four methods and three datasets. In our new version we evaluated five methods and eight dataset variants, and we modified the presentation of the results. Now each Figure 2 to 5 shows all ROC and PR results for each dataset; instead of our previous version where each figure shown ROC results for all datasets in one figure and PR results for all datasets in another figure. - We added a Figure 6, which has a summary of results and new results on the influence of the number of genes in cell type signatures on the performance of methods. - We added Supplementary Table 1 with the actual values of Figure 6A-D and Supplementary Table 2 with a comparison of an alternative signature dataset for the PBMC datasets - We modified our software code to take prediction outputs in a simpler format than our previous version. The GitHub and Zenodo links were updated accordingly. - The main text has been clarified in several places., , [version 2; peer review: 1 approved, 2 approved with reservations]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.18490.2
Document Type :
research-article
Full Text :
https://doi.org/10.12688/f1000research.18490.2