1. EPSOL: sequence-based protein solubility prediction using multidimensional embedding.
- Author
-
Wu, Xiang and Yu, Liang
- Subjects
- *
PROTEIN-protein interactions , *PROTEIN expression , *SOLUBILITY , *RECOMBINANT proteins , *PROTEINS , *DEEP learning - Abstract
Motivation The heterologous expression of recombinant protein requires host cells, such as Escherichiacoli , and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E.coli expression system before the actual experimental work is highly sought. Results In this article, EPSOL, a novel deep learning architecture for the prediction of protein solubility in an E.coli expression system, which automatically obtains comprehensive protein feature representations using multidimensional embedding, is presented. EPSOL outperformed all existing sequence-based solubility predictors and achieved 0.79 in accuracy and 0.58 in Matthew's correlation coefficient. The higher performance of EPSOL permits large-scale screening for sequence variants with enhanced manufacturability and predicts the solubility of new recombinant proteins in an E.coli expression system with greater reliability. Availability and implementation EPSOL's best model and results can be downloaded from GitHub (https://github.com/LiangYu-Xidian/EPSOL). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF