Back to Search
Start Over
An efficient protein homology detection approach based on seq2seq model and ranking
- Source :
- Biotechnology & Biotechnological Equipment, Vol 35, Iss 1, Pp 633-640 (2021)
- Publication Year :
- 2021
- Publisher :
- Informa UK Limited, 2021.
-
Abstract
- Evolutionary information is essential for the protein annotation. The number of homologs of a protein retrieved is correlated with the annotations related to the protein structure or function. With the continuous increase in the number of available sequences, fast and effective homology detection methods are particularly important. To increase the efficiency of homology detection, a novel method named CONVERT is proposed in this paper. This method regards homology detection as a translation task and presents a concept of representative protein. Representative proteins are not real proteins. A representative protein corresponds to a protein family, it contains the characteristics of the family. Our method employs the seq2seq model to establish the many-to-one relationship between proteins and representative proteins. Based on the many-to-one relationship, CONVERT converts protein sequences into fixed-length numerical representations, so as to increase the efficiency of homology detection by using numerical comparison instead of sequence alignment. For alignment results, our method adopts ranking to obtain a sorted list. We evaluate the proposed method on two benchmark datasets. The experimental results show that the performances of our method are comparable with the state-of-the-art methods. Meanwhile, our method is ultra-fast and can obtain results in hundreds of milliseconds.
- Subjects :
- 0106 biological sciences
0303 health sciences
homology detection
A protein
Computational biology
Biology
translation task
01 natural sciences
Ranking (information retrieval)
06 Biological Sciences, 09 Engineering, 10 Technology
03 medical and health sciences
ComputingMethodologies_PATTERNRECOGNITION
Protein structure
Protein Annotation
seq2seq model
ranking
Protein homology
Evolutionary information
Function (biology)
TP248.13-248.65
030304 developmental biology
010606 plant biology & botany
Biotechnology
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Biotechnology & Biotechnological Equipment, Vol 35, Iss 1, Pp 633-640 (2021)
- Accession number :
- edsair.doi.dedup.....2c5ef61c0516707fe71cc90b33355353