Back to Search Start Over

Hitac: a hierarchical taxonomic classifier for fungal ITS sequences compatible with QIIME2

Authors :
Fábio M. Miranda
Vasco C. Azevedo
Rommel J. Ramos
Bernhard Y. Renard
Vitor C. Piro
Source :
BMC Bioinformatics, Vol 25, Iss 1, Pp 1-13 (2024)
Publication Year :
2024
Publisher :
BMC, 2024.

Abstract

Abstract Background Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors. Results Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI. Conclusions HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac .

Details

Language :
English
ISSN :
14712105
Volume :
25
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
edsdoj.7a54feede6ea4ee7aed1bc6f1a6d3802
Document Type :
article
Full Text :
https://doi.org/10.1186/s12859-024-05839-x