1. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
- Author
-
John C. Marioni, Catalina A. Vallejos, and Sylvia Richardson
- Subjects
Bioinformatics ,QH301-705.5 ,Population ,Posterior probability ,RNA-Seq ,Computational biology ,Biology ,Mice ,Cellular and Molecular Neuroscience ,Bayes' theorem ,Genetics ,Animals ,Bayesian hierarchical modeling ,RNA, Messenger ,Biology (General) ,education ,Molecular Biology ,01 Mathematical Sciences ,Embryonic Stem Cells ,Ecology, Evolution, Behavior and Systematics ,Oligonucleotide Array Sequence Analysis ,08 Information And Computing Sciences ,education.field_of_study ,Ecology ,Gene Expression Profiling ,Computational Biology ,Reproducibility of Results ,Bayes Theorem ,06 Biological Sciences ,Gene expression profiling ,MRNA Sequencing ,Computational Theory and Mathematics ,Single cell sequencing ,Modeling and Simulation ,Single-Cell Analysis ,Research Article - Abstract
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach., Author Summary Gene expression signatures have historically been used to generate molecular fingerprints that characterise distinct tissues. Moreover, by interrogating these molecular signatures it has been possible to understand how a tissue’s function is regulated at the molecular level. However, even between cells from a seemingly homogeneous tissue sample, there exists substantial heterogeneity in gene expression levels. These differences might correspond to novel subtypes or to transient states linked, for example, to the cell cycle. Single-cell RNA-sequencing, where the transcriptomes of individual cells are profiled using next generation sequencing, provides a method for identifying genes that show more variation across cells than expected by chance, which might be characteristic of such populations. However, single-cell RNA-sequencing is subject to a high degree of technical noise, making it necessary to account for this to robustly identify such genes. To this end, we use a fully Bayesian approach that jointly models extrinsic spike-in molecules with genes from the cells of interest allowing better identification of such genes than previously described computational strategies. We validate our approach using data from mouse Embryonic Stem Cells.
- Published
- 2015