Back to Search Start Over

BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing data [version 2; peer review: 3 approved with reservations]

Authors :
Alan O'Callaghan
Nils Eling
John C. Marioni
Catalina A. Vallejos
Author Affiliations :
<relatesTo>1</relatesTo>MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK<br /><relatesTo>2</relatesTo>Institute for Molecular Health Sciences, ETH Zürich, Zürich, 8093, Switzerland<br /><relatesTo>3</relatesTo>Department of Quantitative Biomedicine, University of Zurich, Zürich, CH-8057, Switzerland<br /><relatesTo>4</relatesTo>Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, UK<br /><relatesTo>5</relatesTo>European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, CB10 1SD, UK<br /><relatesTo>6</relatesTo>The Alan Turing Institute, The Alan Turing Institute, London, NW1 2DB, UK
Source :
F1000Research. 11:59
Publication Year :
2024
Publisher :
London, UK: F1000 Research Limited, 2024.

Abstract

Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.

Details

ISSN :
20461402
Volume :
11
Database :
F1000Research
Journal :
F1000Research
Notes :
Revised Amendments from Version 1 As a result of helpful and insightful comments by the reviewers, the workflow has been substantially simplified and streamlined, while switching focus to more modern, droplet-based scRNAseq data. The code has been simplified wherever possible, removing content covered in greater detail in other documents, while still demonstrating all of the relevant information, tools and techniques for a robust analysis of scRNAseq data using BASiCS. Figure 1 is a schematic of BASiCS; figure 2 is the workflow overview shown in the previous figure 1. Figures 2-5 have been removed. Figures 3-5 are equivalent to Figures 6-7 in the previous version. Figure 6 shows the criteria used for selection of a posterior probability threshold for selection of highly- and lowly-variable genes using BASiCS. Figure 7 shows BASiCS’ posterior estimates of variability against posterior estimates of mean expression. Figure 8 shows example expression profiles for highly- and lowly-variable genes with similar overall levels of expression. Figure 9 shows mean-difference and volcano plots of a differential mean expression analysis of somitic vs pre-somitic mesoderm cells. Figures 10-12 show normalised expression values for somitic and pre-somitic cells, of genes up-regulated in pre-somitic cells (Fig 10), genes up-regulated in somitic cells (Fig 11), and genes neither up- nor down-regulated in either population of cells (Fig 12). Figure 13 shows mean-difference and volcano plots of a differential expression variability analysis of somitic vs pre-somitic mesoderm cells. Figure 14 shows differences in residual over-dispersion plotted against changes in mean expression. Figure 15 plots the difference in mean expression for genes with different levels of residual over-dispersion. Figure 16 shows violin plots of denoised counts for two classes of genes in each population: - those with higher residual over-disperion and similar levels of detection - those with higher residual over-disperion and different levels of detection, , [version 2; peer review: 3 approved with reservations]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.74416.2
Document Type :
other
Full Text :
https://doi.org/10.12688/f1000research.74416.2