Back to Search
Start Over
A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
- Source :
- PLoS Genetics, PLoS Genetics, Vol 17, Iss 9, p e1009811 (2021)
- Publication Year :
- 2021
- Publisher :
- Public Library of Science, 2021.
-
Abstract
- Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions.<br />Author summary Data analysis using Bayesian networks can help identify possible causal relationships between measured biological variables. Here we propose two improvements to an existing method for Bayesian network analysis. Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis, even if only one or a few variables are missing. This is undesirable as it can reduce the ability of the approach to infer correct relationships. We propose a new method to instead fill in (impute) the missing data prior to analysis. We show through computer simulations that our method improves the reliability of the results obtained, and we illustrate the proposed approach by applying it to data from a recent study in early inflammatory arthritis. We also describe a second improvement involving the upweighting of certain network edges, which can be useful when there is prior evidence concerning their directions.
- Subjects :
- Cancer Research
B Cells
Gene Identification and Analysis
Gene Expression
02 engineering and technology
Genetic Networks
QH426-470
computer.software_genre
Biochemistry
White Blood Cells
Animal Cells
0202 electrical engineering, electronic engineering, information engineering
Medicine and Health Sciences
Imputation (statistics)
Genetics (clinical)
0303 health sciences
DNA methylation
T Cells
Applied Mathematics
Simulation and Modeling
Chemical Reactions
Simple random sample
Chromatin
Nucleic acids
Chemistry
Data Interpretation, Statistical
Physical Sciences
020201 artificial intelligence & image processing
Epigenetics
Data mining
Cellular Types
DNA modification
Algorithms
Chromatin modification
Network Analysis
Network analysis
Research Article
Chromosome biology
Cell biology
Computer and Information Sciences
Immune Cells
Immunology
Biology
Research and Analysis Methods
Methylation
03 medical and health sciences
Genetics
Humans
Antibody-Producing Cells
Molecular Biology
Ecology, Evolution, Behavior and Systematics
030304 developmental biology
Blood Cells
Bayesian network
Biology and Life Sciences
Bayes Theorem
Exploratory analysis
DNA
Missing data
Precision and recall
computer
Mathematics
Data reduction
Subjects
Details
- Language :
- English
- ISSN :
- 15537404 and 15537390
- Volume :
- 17
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- PLoS Genetics
- Accession number :
- edsair.doi.dedup.....57a76d392716744d0c1dfe503daf6206