Back to Search Start Over

Clustering protein functional families at large scale with hierarchical approaches.

Authors :
Bordin N
Scholes H
Rauer C
Roca-Martínez J
Sillitoe I
Orengo C
Source :
Protein science : a publication of the Protein Society [Protein Sci] 2024 Sep; Vol. 33 (9), pp. e5140.
Publication Year :
2024

Abstract

Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.<br /> (© 2024 The Author(s). Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.)

Details

Language :
English
ISSN :
1469-896X
Volume :
33
Issue :
9
Database :
MEDLINE
Journal :
Protein science : a publication of the Protein Society
Publication Type :
Academic Journal
Accession number :
39145441
Full Text :
https://doi.org/10.1002/pro.5140