Back to Search Start Over

Application of genomic signal processing as a tool for high-performance classification of SARS-CoV-2 variants: a machine learning-based approach.

Authors :
Kar, Subhajit
Ganguly, Madhabi
Source :
Soft Computing - A Fusion of Foundations, Methodologies & Applications. Feb2024, Vol. 28 Issue 4, p2891-2918. 28p.
Publication Year :
2024

Abstract

From the beginning of COVID-19 pandemic, numerous mutants of SARS-CoV-2 have since been evolved owing to high transmissibility and virulence. Due to the limited effectiveness of previously imposed vaccines and preventive therapies, these strains are still causing concern. This paper proposes comparative evaluation of three novel genomic signal processing-based methods employing discrete wavelet decomposition with lifting (DWT), discrete Fourier transform (DFT), and singular value decomposition (SVD) for the classification of emerging SARS-CoV-2 variants utilizing feature extraction from collected SARS-CoV-2 variants acquired from the NCBI virus database. The efficiency and accuracy of the proposed alignment-free algorithms have been tested using three Coronavirus datasets including human Coronavirus (HCoV), SARS-CoV-2 variants (CoV-Variants and Omicron). The viral nucleotide sequences which are converted into numerical representation leveraging purine-pyrimidine mapping, DNA walk & Z-curve are fed into DWT, SVD, & DFT processors, respectively. In the approach with DWT, the second-generation wavelet transform employs two best wavelet bases Daubechies (Db) and Biorthogonal (Bior) based on the validation of the HCoV dataset for the feature extraction of the CoV-Variants dataset. Various machine learning algorithms, such as Support Vector Machine, K-nearest neighbors, and ensemble, are used to classify the virus strains and evaluate the efficacy of the algorithm. Finally, hyper-parametric tuning is done utilizing the Bayesian optimization technique to select the best fit model for KNN and SVM. The proposed algorithm has successfully classified the CoV-Variants dataset with an average accuracy of 98.76% utilizing the DWT, DFT, and SVD, while the best-achieved accuracy for this dataset is 98.9% using the DWT technique employing purine–pyrimidine mapping. The best-achieved accuracy rate for predicting Omicron is 99.8% using SVD-based technique. The best-obtained accuracy for HCoV dataset is 100% resulted in all three methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14327643
Volume :
28
Issue :
4
Database :
Academic Search Index
Journal :
Soft Computing - A Fusion of Foundations, Methodologies & Applications
Publication Type :
Academic Journal
Accession number :
175234544
Full Text :
https://doi.org/10.1007/s00500-023-09577-9