Back to Search Start Over

DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction

Authors :
Halima Bensmail
Balasubramanian Moovarkumudalvan
Khalid Kunji
Raghvendra Mall
Abdurrahman Elbasir
Prasanna R. Kolatkar
Source :
BIBM
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

Motivation Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. Results Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew’s correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. Availability and implementation The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. Supplementary information Supplementary data are available at Bioinformatics online.

Details

Database :
OpenAIRE
Journal :
2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Accession number :
edsair.doi.dedup.....c69441ae642f519ec57e59fecc333ed7
Full Text :
https://doi.org/10.1109/bibm.2018.8621202