1. Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
- Author
-
Yousong Peng, Xingyi Ge, Huiting Chen, Heping Zheng, Zhaozhong Zhu, and Ye Qiu
- Subjects
medicine.medical_treatment ,Immunology ,Computational biology ,Cleavage (embryo) ,medicine.disease_cause ,Alphacoronavirus ,Machine Learning ,Viral Proteins ,Virology ,medicine ,Humans ,Protease Inhibitors ,Coronavirus ,chemistry.chemical_classification ,Gammacoronavirus ,Protease ,biology ,Chemistry ,biology.organism_classification ,Cysteine protease ,Amino acid ,Cysteine Endopeptidases ,Molecular Medicine ,Coronavirus Infections ,Betacoronavirus ,Algorithms ,Peptide Hydrolases - Abstract
The coronavirus 3C-like (3CL) protease is a Cysteine protease. It plays an important role in viral infection and immune escape by not only cleaving the viral polyprotein ORF1ab at 11 sites, but also cleaving the host proteins. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues at cleavage site and neighboring positions by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset composed of cleavage sites on host proteins, and achieved an AUC of 0.88 and a prediction precision of 0.80 when considering the accessibility of the cleavage site. Then, 1,079 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in pathways related to neurodegenerative diseases and pathogen infection. Finally, a user-friendly online server named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study not only provides an effective tool for identifying the cleavage sites of the 3CL protease, but also provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.
- Published
- 2022
- Full Text
- View/download PDF