Back to Search Start Over

SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models

Authors :
Salvatore Cosentino
Sira Sriswasdi
Wataru Iwasaki
Source :
Genome Biology, Vol 25, Iss 1, Pp 1-18 (2024)
Publication Year :
2024
Publisher :
BMC, 2024.

Abstract

Abstract Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .

Details

Language :
English
ISSN :
1474760X
Volume :
25
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Genome Biology
Publication Type :
Academic Journal
Accession number :
edsdoj.9941d17d57cb43df95f40d7e731b04ed
Document Type :
article
Full Text :
https://doi.org/10.1186/s13059-024-03298-4