Back to Search
Start Over
DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction
- Source :
- BIBM
- Publication Year :
- 2018
- Publisher :
- IEEE, 2018.
-
Abstract
- Motivation Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. Results Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew’s correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. Availability and implementation The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
0301 basic medicine
Source code
Computer science
media_common.quotation_subject
Feature vector
Machine learning
computer.software_genre
Biochemistry
Convolutional neural network
law.invention
03 medical and health sciences
Deep Learning
Protein structure
law
Amino Acid Sequence
Crystallization
Molecular Biology
030304 developmental biology
media_common
0303 health sciences
Sequence
Series (mathematics)
business.industry
Deep learning
030302 biochemistry & molecular biology
Computational Biology
Proteins
Pattern recognition
Computer Science Applications
Computational Mathematics
030104 developmental biology
Computational Theory and Mathematics
Artificial intelligence
business
Protein crystallization
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
- Accession number :
- edsair.doi.dedup.....c69441ae642f519ec57e59fecc333ed7
- Full Text :
- https://doi.org/10.1109/bibm.2018.8621202