Back to Search Start Over

STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring.

Authors :
Javaid, Azka
Frost, Hildreth Robert
Source :
PLoS Computational Biology; 8/21/2023, Vol. 19 Issue 8, p1-24, 24p, 2 Charts, 13 Graphs
Publication Year :
2023

Abstract

The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model. Author summary: Herein, we present an overview of our recently developed supervised receptor abundance estimation technique, STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding), which leverages co-expression associations learned from joint scRNA-seq/CITE-seq data to perform approximate abundance estimation. More specifically, STREAK functions by utilizing these expression associations to develop weighted membership gene sets, which are next thresholded following a gene set scoring procedure. These thresholded scores are set to the estimated abundance profiles. We validate STREAK relative to both unsupervised and supervised estimation approaches using two different evaluation approaches, which include a cross-validation and a cross-training strategy, and approximately four different tissue types, which include the peripheral blood mononuclear cells, mesothelial cells, monocytes and lymphoid tissue. We conclude that STREAK outperforms comparative receptor abundance estimation approaches via a relatively more biologically interpretable and transparent statistical model, facilitated by VAM's (the Variance-adjusted Mahalanobis distance measure) customizable gene set scoring procedure. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1553734X
Volume :
19
Issue :
8
Database :
Complementary Index
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
170046966
Full Text :
https://doi.org/10.1371/journal.pcbi.1011413