Back to Search Start Over

Predicting binding affinities of emerging variants of SARS-CoV-2 using spike protein sequencing data: observations, caveats and recommendations.

Authors :
Zhang R
Ghosh S
Pal R
Source :
Briefings in bioinformatics [Brief Bioinform] 2022 May 13; Vol. 23 (3).
Publication Year :
2022

Abstract

Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein-protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.<br /> (© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)

Details

Language :
English
ISSN :
1477-4054
Volume :
23
Issue :
3
Database :
MEDLINE
Journal :
Briefings in bioinformatics
Publication Type :
Academic Journal
Accession number :
35437577
Full Text :
https://doi.org/10.1093/bib/bbac128