Back to Search Start Over

Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data

Authors :
Hunyong Cho
Yixiang Qu
Chuwen Liu
Boyang Tang
Ruiqi Lyu
Bridget M. Lin
Jeffrey Roach
M. Andrea Azcarate-Peril
Apoena de Aguiar Ribeiro
Michael I. Love
Kimon Divaris
Di Wu
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

Background: Measuring and understanding the function of the human microbiome is key for several aspects of health; however, the development of statistical methods specifically for the analysis of microbial gene expression (i.e., metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this knowledge gap, we undertook a comprehensive evaluation and benchmarking of eight differential analysis methods for metatranscriptomics data. Results: We used a combination of real and simulated metatranscriptomics data to evaluate the performance (i.e., model fit, Type-I error, and statistical power) of eights methods: log-normal (LN), logistic-beta (LB), MAST, Kruskal-Wallis, two-part Kruskal-Wallis, DESeq2, and ANCOM-BC and metagenomeSeq. The simulation was informed by supragingival biofilm microbiome data from about 300 preschool-age children enrolled in a study of early childhood caries (ECC), whereas validations were sought in two additional datasets, including an ECC and an inflammatory bowel disease one. The LB test showed the highest power in both small and large sample sizes and reasonably controlled Type-I error. Contrarily, MAST was hampered by inflated Type-I error. Using LN and LB tests, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with ECC. Conclusion: This comprehensive model evaluation findings offer practical guidance for the selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics data. Selection of an optimal method is likely to increase the possibility of detecting true signals while minimizing the chance of claiming false ones.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........68a0bc758de8dec1cebdf6fa33449d1c
Full Text :
https://doi.org/10.1101/2021.07.14.452374