Back to Search Start Over

Statistical Methods for the Analysis of Large-Scale and Single-Cell RNA-Sequencing Data

Authors :
Hu, Chengcheng
Billheimer, Dean
Slack, Donald
Ran, Di
Hu, Chengcheng
Billheimer, Dean
Slack, Donald
Ran, Di
Publication Year :
2018

Abstract

RNA-sequencing (RNA-seq), based on next generation sequencing (NGS) technologies, has become the preferred tool for transcriptome analysis in the past two decades. Ever maturing and decreasing costs of high-throughput sequencing technologies have led to new types of data such as large-scale RNA-seq and single-cell RNA-seq (scRNA-seq) data. The analysis of these new types of RNA-seq data presents both new opportunities and challenges. In this dissertation, I present three novel statistical works that focus on these types of RNA-seq data, driven by various interests of research. The first project, MDSeq, introduces the first gene expression variability analysis for large-scale RNA-seq count data. MDSeq utilizes a novel reparametrization of the negative binomial distribution to provide flexible generalized linear models (GLMs) on both the mean and dispersion, and simultaneously addresses the challenges of analyzing large-scale RNA-seq data by modeling technical excess of zeros, identifying outliers efficiently, and evaluating differential expressions at biologically interesting levels. The last two works, scDoc and scGSA, are analysis tools for the recently emerging scRNA-seq with different perspectives. scDoc is a statistical tool that accurately and robustly imputes drop-out events in scRNA-seq data. It is the first drop-out imputation method that includes drop-out information when accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. scGSA is a novel gene set analysis tool for scRNA-seq data. Without any prior knowledge about class labels (e.g., label of cell types), which is required by all existing gene set analysis approaches, scGSA can detect significant gene sets relating to biologically meaningful heterogeneity among cells. Through various comprehensive simulation studies, all three proposed methods have demonstrated the highest power compared with other existing methods wh

Details

Database :
OAIster
Publication Type :
Electronic Resource
Accession number :
edsoai.on1118684786
Document Type :
Electronic Resource