Back to Search Start Over

SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm.

Authors :
Ibtehaz N
Ahmed I
Ahmed MS
Rahman MS
Azad RK
Bayzid MS
Source :
Briefings in bioinformatics [Brief Bioinform] 2021 Nov 05; Vol. 22 (6).
Publication Year :
2021

Abstract

Background: Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision-recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes.<br />Results: We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick' and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method.<br />Conclusions: Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/.<br /> (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)

Details

Language :
English
ISSN :
1477-4054
Volume :
22
Issue :
6
Database :
MEDLINE
Journal :
Briefings in bioinformatics
Publication Type :
Academic Journal
Accession number :
34058749
Full Text :
https://doi.org/10.1093/bib/bbab116