Back to Search Start Over

An efficient protein homology detection approach based on seq2seq model and ranking

Authors :
Song Gao
Shui Yu
Shaowen Yao
Source :
Biotechnology & Biotechnological Equipment, Vol 35, Iss 1, Pp 633-640 (2021)
Publication Year :
2021
Publisher :
Informa UK Limited, 2021.

Abstract

Evolutionary information is essential for the protein annotation. The number of homologs of a protein retrieved is correlated with the annotations related to the protein structure or function. With the continuous increase in the number of available sequences, fast and effective homology detection methods are particularly important. To increase the efficiency of homology detection, a novel method named CONVERT is proposed in this paper. This method regards homology detection as a translation task and presents a concept of representative protein. Representative proteins are not real proteins. A representative protein corresponds to a protein family, it contains the characteristics of the family. Our method employs the seq2seq model to establish the many-to-one relationship between proteins and representative proteins. Based on the many-to-one relationship, CONVERT converts protein sequences into fixed-length numerical representations, so as to increase the efficiency of homology detection by using numerical comparison instead of sequence alignment. For alignment results, our method adopts ranking to obtain a sorted list. We evaluate the proposed method on two benchmark datasets. The experimental results show that the performances of our method are comparable with the state-of-the-art methods. Meanwhile, our method is ultra-fast and can obtain results in hundreds of milliseconds.

Details

Language :
English
Database :
OpenAIRE
Journal :
Biotechnology & Biotechnological Equipment, Vol 35, Iss 1, Pp 633-640 (2021)
Accession number :
edsair.doi.dedup.....2c5ef61c0516707fe71cc90b33355353