Back to Search Start Over

eXplainable Artificial Intelligence (XAI) for the identification of biologically relevant gene expression patterns in longitudinal human studies, insights from obesity research

Authors :
Jesús Alcalá-Fdez
Augusto Anguita-Ruiz
Concepción M. Aguilera
Rafael Alcalá
Alberto Segura-Delgado
[Anguita-Ruiz,A
Aguilera,CM] Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology 'Jose´ Mataix', Center of Biomedical Research, University of Granada, Granada, Spain. [Anguita-Ruiz,A
Aguilera,CM] Instituto de Investigacio´n Biosanitaria ibs.GRANADA, Granada, Spain. [Anguita-Ruiz,A
Aguilera,CM] CIBEROBN (Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III (ISCIII), Madrid, Spain. [Segura-Delgado,A
Alcalá,R
Alcalá-Fdez,J] Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.
This work was supported by the Mapfre Foundation ('Research grants by Ignacio H. de Larramendi 2017') and by the Regional Government of Andalusia ('Plan Andaluz de investigación, desarrollo e innovación (2018), P18- RT-2248'). The authors also acknowledge the Institute of Health Carlos III for personal funding: Contratos i-PFIS: doctorados IIS-empresa en ciencias y tecnologías de la salud de la convocatoria 2017 de la Acción Estratégica en Salud 2013–2016, Project number: IFI17/00048.
Source :
PLoS Computational Biology, Vol 16, Iss 4, p e1007792 (2020), PLoS Computational Biology, Digibug: Repositorio Institucional de la Universidad de Granada, Universidad de Granada (UGR), Digibug. Repositorio Institucional de la Universidad de Granada, instname
Publication Year :
2020
Publisher :
Public Library of Science (PLoS), 2020.

Abstract

Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on “Black-box” algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.<br />Author summary Biological processes in humans are not single-gene based mechanisms, but complex systems controlled by regulatory interactions between thousands of genes. Within these gene regulatory networks, time-delay is a common phenomenon and genes interact each other within a four-dimension space. Hence, to fully understand or to control biological processes we need to unravel the principles of gene-gene temporal interactions. Until date, several approaches based on Artificial Intelligence methods have tried to address this issue. Nevertheless, the research community has claimed for more interpretable and biologically meaningful models. Particularly, scientists claim for methods able to infer gene-gene temporal interactions that could be later validated with real-life experiments at the lab. The recent revolution known as “eXplainable Artificial Intelligence” offers a solution for this issue, where a range of highly interpretable and explicable models has become available. Many of these methods could be applied to temporal gene expression data in order to decipher mentioned temporal gene-gene relationships in humans. Here, we propose and validate a new pipeline analysis including an eXplainable artificial intelligence method for the identification of comprehensible gene-gene temporal relationships from human intervention studies. Our method has been validated in six datasets from obesity research (consisting of low calorie diets interventions), where it was able to extract meaningful gene-gene temporal interactions with relevance the etiology of the disease. The application of our pipeline to other type of human temporal gene profiles would greatly expand our knowledge for complex biological processes, with a special interest for drug clinical trials, in which identified gene-gene regulatory interactions could reveal new therapeutic targets.

Subjects

Subjects :
0301 basic medicine
Information Science::Information Science::Medical Informatics::Medical Informatics Applications::Information Storage and Retrieval::Data Mining [Medical Subject Headings]
Computer science
Microarrays
Physiology
Gene regulatory network
Obesidad
Gene Expression
Information Science::Information Science::Medical Informatics::Medical Informatics Applications::Information Systems::Databases as Topic::Databases, Factual::Databases, Genetic [Medical Subject Headings]
Organisms::Eukaryota::Animals::Chordata::Vertebrates::Mammals::Primates::Haplorhini::Catarrhini::Hominidae::Humans [Medical Subject Headings]
Machine Learning
Database and Informatics Methods
0302 clinical medicine
Software
Information Science::Information Science::Computing Methodologies::Artificial Intelligence [Medical Subject Headings]
Information Science::Information Science::Computing Methodologies::Software [Medical Subject Headings]
Databases, Genetic
Medicine and Health Sciences
Data Mining
Gene Regulatory Networks
Longitudinal Studies
Biology (General)
Soundness
Functional validation
Ecology
Human studies
Phenomena and Processes::Genetic Phenomena::Genetic Processes [Medical Subject Headings]
Applied Mathematics
Simulation and Modeling
Inteligencia artificial
Identification (information)
Knowledge
Phenomena and Processes::Genetic Phenomena::Genetic Structures::Transcriptome [Medical Subject Headings]
Bioassays and Physiological Analysis
Computational Theory and Mathematics
Disciplines and Occupations::Natural Science Disciplines::Biological Science Disciplines::Biology::Computational Biology [Medical Subject Headings]
Physiological Parameters
Modeling and Simulation
Physical Sciences
Information Technology
Sequence Analysis
Algorithms
Research Article
Computer and Information Sciences
Process (engineering)
Bioinformatics
QH301-705.5
Sequence Databases
Analytical, Diagnostic and Therapeutic Techniques and Equipment::Investigative Techniques::Genetic Techniques::Gene Expression Profiling [Medical Subject Headings]
Minería de datos
Research and Analysis Methods
Mining
03 medical and health sciences
Cellular and Molecular Neuroscience
Conocimiento
Artificial Intelligence
Analytical, Diagnostic and Therapeutic Techniques and Equipment::Investigative Techniques::Epidemiologic Methods::Epidemiologic Study Characteristics as Topic::Epidemiologic Studies::Cohort Studies::Longitudinal Studies [Medical Subject Headings]
Genetics
Humans
Gene Regulation
Obesity
Molecular Biology
Ecology, Evolution, Behavior and Systematics
business.industry
Gene Expression Profiling
Body Weight
Biology and Life Sciences
Computational Biology
Pipeline (software)
Diseases::Nutritional and Metabolic Diseases::Nutrition Disorders::Overnutrition::Obesity [Medical Subject Headings]
Transcriptoma
030104 developmental biology
Biological Databases
Artificial intelligence
Gene expression
business
Information Science::Information Science::Computing Methodologies::Algorithms [Medical Subject Headings]
Transcriptome
030217 neurology & neurosurgery
Mathematics
Expresión génica

Details

Language :
English
ISSN :
15537358
Volume :
16
Issue :
4
Database :
OpenAIRE
Journal :
PLoS Computational Biology
Accession number :
edsair.doi.dedup.....595429dd76a4c027343a86fe7e271ccc