Back to Search Start Over

Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone.

Authors :
Kuzmin K
Adeniyi AE
DaSouza AK Jr
Lim D
Nguyen H
Molina NR
Xiong L
Weber IT
Harrison RW
Source :
Biochemical and biophysical research communications [Biochem Biophys Res Commun] 2020 Dec 10; Vol. 533 (3), pp. 553-558. Date of Electronic Publication: 2020 Sep 18.
Publication Year :
2020

Abstract

Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, F <subscript>1</subscript> scores, sensitivities and specificities of 0.95-0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.<br />Competing Interests: Declaration of competing interest The authors declared no conflict of interest.<br /> (Copyright © 2020 Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1090-2104
Volume :
533
Issue :
3
Database :
MEDLINE
Journal :
Biochemical and biophysical research communications
Publication Type :
Academic Journal
Accession number :
32981683
Full Text :
https://doi.org/10.1016/j.bbrc.2020.09.010