Back to Search
Start Over
Clustering protein functional families at large scale with hierarchical approaches.
- Source :
-
Protein science : a publication of the Protein Society [Protein Sci] 2024 Sep; Vol. 33 (9), pp. e5140. - Publication Year :
- 2024
-
Abstract
- Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.<br /> (© 2024 The Author(s). Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.)
Details
- Language :
- English
- ISSN :
- 1469-896X
- Volume :
- 33
- Issue :
- 9
- Database :
- MEDLINE
- Journal :
- Protein science : a publication of the Protein Society
- Publication Type :
- Academic Journal
- Accession number :
- 39145441
- Full Text :
- https://doi.org/10.1002/pro.5140