Back to Search Start Over

Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity.

Authors :
Sokhansanj, Bahrad A.
Zhao, Zhengqiao
Rosen, Gail L.
Source :
Biology (2079-7737); Dec2022, Vol. 11 Issue 12, p1786, 28p
Publication Year :
2022

Abstract

Simple Summary: As COVID-19 shifts from pandemic to endemic, emerging variants may be more or less virulent. Predicting whether an emerging COVID-19 variant has of high risk of causing severe disease is needed to plan for potential burdens on hospital capacity and protecting vulnerable populations. However, it takes time to do laboratory and animal experiments to determine whether a new genetic variant might be more severe, and the results may not be representative of when the virus infects humans. By the time there is epidemiological data on the severity of disease associated with a new variant, it can be too late for designing an optimal public health response. There is a critical need for computer models that can predict severe disease risk from genetic sequence data, which can be obtained from just the first few infections in a potential incoming wave. Two key challenges make computer modeling difficult: (1) sequence changes are complex, and (2) using historical data to predict future disease requires accounting for the confounding effects of changing patient demographics, improving therapeutics, and increased vaccination. In this paper, we introduce a novel interpretable deep learning architecture to solve this problem, demonstrating that it can make robust predictions for emerging variants. Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20797737
Volume :
11
Issue :
12
Database :
Complementary Index
Journal :
Biology (2079-7737)
Publication Type :
Academic Journal
Accession number :
160943958
Full Text :
https://doi.org/10.3390/biology11121786