Start Over

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Authors :: Bahrad A. Sokhansanj
Gail L. Rosen
Stephen Woloszynek
Zhengqiao Zhao
Felix Agbavor
Joshua Chang Mell
Source :: PLoS Computational Biology, Vol 17, Iss 9, p e1009345 (2021), PLoS Computational Biology
Publication Year :: 2021
Publisher :: Public Library of Science (PLoS), 2021.
Abstract: Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).<br />Author summary Microbiomes are communities of microscopic organisms found everywhere, including on and in the human body. For example, the gut microbiome plays an important role in digestion, and changes in composition are associated with changes in health or disease, e.g., inflammatory bowel disease (IBD). Today, microbiome composition is often obtained from high-throughput sequencing, which generates many short DNA reads from multiple organisms in a sample. In this paper, we present a novel deep learning framework, Read2Pheno, to predict phenotype from all the reads in a set of biological samples. An attention mechanism allows visualization of specific subregions (sets of bases) which are important in classifying the reads according to phenotype or taxon labels. We evaluate the framework on sequencing data for 16S rRNA genes, genetic markers used to identify microbial taxonomy. We show that Read2Pheno performs comparably as conventional methods on three distinct data sets from the American Gut Project, IBD patients and controls, and a comprehensive taxonomic database. Moreover, Read2Pheno results can be readily interpreted—e.g., to identify regions of the 16S rRNA gene to target for PCR diagnostics—without additional pre/post-processing steps that can introduce complexity and error.

Subjects :: Physiology
Computer science
Prevotella
computer.software_genre
Convolutional neural network
Biochemistry
Database and Informatics Methods
RNA, Ribosomal, 16S
Databases, Genetic
Biology (General)
computer.programming_language
Data Management
Network model
Network architecture
Ecology
Artificial neural network
Nucleotides
Microbiota
Genomics
Nucleic acids
Phenotype
Computational Theory and Mathematics
Ribosomal RNA
Physiological Parameters
Medical Microbiology
Modeling and Simulation
Sequence Analysis
Host (network)
Algorithms
Research Article
Cell biology
Computer and Information Sciences
Cellular structures and organelles
Bioinformatics
QH301-705.5
Nucleotide Sequencing
Context (language use)
Microbial Genomics
ENCODE
Research and Analysis Methods
Machine learning
Microbiology
Proof of Concept Study
Cellular and Molecular Neuroscience
Deep Learning
Genetics
Humans
Non-coding RNA
Molecular Biology Techniques
Sequencing Techniques
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Taxonomy
Natural Language Processing
Biology and life sciences
Bacteria
Host Microbial Interactions
business.industry
Deep learning
Body Weight
Organisms
Computational Biology
Python (programming language)
Inflammatory Bowel Diseases
Gastrointestinal Microbiome
Recurrent neural network
RNA
Microbiome
Neural Networks, Computer
Artificial intelligence
business
Ribosomes
Sequence Alignment
computer

Details

Language :: English
ISSN :: 15537358
Volume :: 17
Issue :: 9
Database :: OpenAIRE
Journal :: PLoS Computational Biology
Accession number :: edsair.doi.dedup.....5b1ecf844a411c96345c7d784a8aafef

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources