Back to Search
Start Over
DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence
- Source :
- International Journal of Molecular Sciences, Volume 22, Issue 11, International Journal of Molecular Sciences, Vol 22, Iss 5521, p 5521 (2021)
- Publication Year :
- 2021
- Publisher :
- Multidisciplinary Digital Publishing Institute, 2021.
-
Abstract
- Predicting in vivo protein–DNA binding sites is a challenging but pressing task in a variety of fields like drug design and development. Most promoters contain a number of transcription factor (TF) binding sites, but only a small minority has been identified by biochemical experiments that are time-consuming and laborious. To tackle this challenge, many computational methods have been proposed to predict TF binding sites from DNA sequence. Although previous methods have achieved remarkable performance in the prediction of protein–DNA interactions, there is still considerable room for improvement. In this paper, we present a hybrid deep learning framework, termed DeepD2V, for transcription factor binding sites prediction. First, we construct the input matrix with an original DNA sequence and its three kinds of variant sequences, including its inverse, complementary, and complementary inverse sequence. A sliding window of size k with a specific stride is used to obtain its k-mer representation of input sequences. Next, we use word2vec to obtain a pre-trained k-mer word distributed representation model. Finally, the probability of protein–DNA binding is predicted by using the recurrent and convolutional neural network. The experiment results on 50 public ChIP-seq benchmark datasets demonstrate the superior performance and robustness of DeepD2V. Moreover, we verify that the performance of DeepD2V using word2vec-based k-mer distributed representation is better than one-hot encoding, and the integrated framework of both convolutional neural network (CNN) and bidirectional LSTM (bi-LSTM) outperforms CNN or the bi-LSTM model when used alone. The source code of DeepD2V is available at the github repository.
- Subjects :
- 0301 basic medicine
Computer science
QH301-705.5
0206 medical engineering
protein–DNA binding
convolutional neural network
02 engineering and technology
Convolutional neural network
Catalysis
Article
Cell Line
Inorganic Chemistry
03 medical and health sciences
Deep Learning
Robustness (computer science)
Sliding window protocol
Encoding (memory)
Humans
Word2vec
Word2Vec
Physical and Theoretical Chemistry
Biology (General)
Promoter Regions, Genetic
Molecular Biology
QD1-999
Spectroscopy
Sequence
Binding Sites
business.industry
Deep learning
Organic Chemistry
Computational Biology
Pattern recognition
General Medicine
DNA
bidirectional long short term memory network
Computer Science Applications
DNA binding site
DNA-Binding Proteins
Chemistry
030104 developmental biology
transcription factor binding sites
Artificial intelligence
business
K562 Cells
020602 bioinformatics
Software
Transcription Factors
Subjects
Details
- Language :
- English
- ISSN :
- 14220067
- Database :
- OpenAIRE
- Journal :
- International Journal of Molecular Sciences
- Accession number :
- edsair.doi.dedup.....bff28a8c1028cd5dacb4f74aa5b0c420
- Full Text :
- https://doi.org/10.3390/ijms22115521