Back to Search Start Over

Addressing confounding artifacts in reconstruction of gene co-expression networks

Authors :
Andrew E. Jaffe
Princy Parsana
Claire Ruberman
Alexis Battle
Jeffrey T. Leek
Michael C. Schatz
Source :
Genome Biology, Genome Biology, Vol 20, Iss 1, Pp 1-6 (2019)
Publication Year :
2017
Publisher :
Cold Spring Harbor Laboratory, 2017.

Abstract

BackgroundGene co-expression networks capture diverse biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanisms. Functional interactions between genes have not been fully characterized for most organisms, and therefore reconstruction of gene co-expression networks has been of common interest in a variety of settings. However, methods routinely used for reconstruction of gene co-expression networks do not account for confounding artifacts known to affect high dimensional gene expression measurements.ResultsIn this study, we show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. Both theoretically and empirically, we demonstrate that removing the effects of top principal components from gene expression measurements prior to network inference can reduce false discoveries, especially when well annotated technical covariates are not available. Using expression data from the GTEx project in multiple tissues and hundreds of individuals, we show that this latent factor residualization approach often reduces false discoveries in the reconstructed networks.ConclusionNetwork reconstruction is susceptible to confounders that affect measurements of gene expression. Even controlling for major individual known technical covariates fails to fully eliminate confounding variation from the data. In studies where a wide range of annotated technical factors are measured and available, correcting gene expression data with multiple covariates can also improve network reconstruction, but such extensive annotations are not always available. Our study shows that principal component correction, which does not depend on study design or annotation of all relevant confounders, removes patterns of artifactual variation and improves network reconstruction in both simulated data, and gene expression data from GTEx project. We have implemented our PC correction approach in the Bioconductor package sva which can be used prior to network reconstruction with a range of methods.

Details

Language :
English
Database :
OpenAIRE
Journal :
Genome Biology, Genome Biology, Vol 20, Iss 1, Pp 1-6 (2019)
Accession number :
edsair.doi.dedup.....7f138e3db25f5ba201f22cc936aab33f
Full Text :
https://doi.org/10.1101/202903