Back to Search
Start Over
DepoScope: Accurate phage depolymerase annotation and domain delineation using large language models.
- Source :
-
PLoS computational biology [PLoS Comput Biol] 2024 Aug 05; Vol. 20 (8), pp. e1011831. Date of Electronic Publication: 2024 Aug 05 (Print Publication: 2024). - Publication Year :
- 2024
-
Abstract
- Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.<br />Competing Interests: The authors have declared that no competing interests exist.<br /> (Copyright: © 2024 Concha-Eloko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Subjects :
- Molecular Sequence Annotation
Viral Proteins genetics
Viral Proteins metabolism
Viral Proteins chemistry
Neural Networks, Computer
Machine Learning
Software
Protein Domains
Genome, Viral genetics
Carboxylic Ester Hydrolases genetics
Carboxylic Ester Hydrolases metabolism
Carboxylic Ester Hydrolases chemistry
Bacteriophages genetics
Bacteriophages enzymology
Computational Biology methods
Subjects
Details
- Language :
- English
- ISSN :
- 1553-7358
- Volume :
- 20
- Issue :
- 8
- Database :
- MEDLINE
- Journal :
- PLoS computational biology
- Publication Type :
- Academic Journal
- Accession number :
- 39102416
- Full Text :
- https://doi.org/10.1371/journal.pcbi.1011831