1. Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
- Author
-
Jacob Borgman, Karen Stark, Jeremy Carson, and Loren Hauser
- Subjects
deep learning ,microbiome ,convolutional neural networks ,rapid sequence identification ,encoding ,embedding ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification.
- Published
- 2022
- Full Text
- View/download PDF