1. The phylogenetic relationship within SARS-CoV-2s: An expanding basal clade
- Author
-
Zhao Zhang, Funan He, and Libing Shen
- Subjects
0106 biological sciences ,0301 basic medicine ,Mutation rate ,viruses ,basal clade ,Genome, Viral ,Biology ,Viral Nonstructural Proteins ,010603 evolutionary biology ,01 natural sciences ,Genome ,Article ,Evolution, Molecular ,03 medical and health sciences ,Basal (phylogenetics) ,Mutation Rate ,Phylogenetics ,RNA proofreading capability ,Genetics ,Humans ,Point Mutation ,Selection, Genetic ,Clade ,skin and connective tissue diseases ,Pandemics ,Molecular Biology ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Whole genome sequencing ,parsimony principle ,SARS-CoV-2 ,Point mutation ,phylogenetic relationship ,fungi ,virus diseases ,COVID-19 ,body regions ,enzymes and coenzymes (carbohydrates) ,030104 developmental biology ,Evolutionary biology ,Mutation (genetic algorithm) ,Mutation - Abstract
Highlights • SARS-CoV-2 with superior RNA proofreading capability has an expanding basal clade., The COVID-19 pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whose origin is still shed in mystery. In this study, we developed a method to search the basal SARS-CoV-2 clade among collected SARS-CoV-2 genome sequences. We first identified the mutation sites in the SARS-CoV-2 whole genome sequence alignment. Then by the pairwise comparison of the numbers of mutation sites among all SARS-CoV-2s, the least mutated clade was identified, which is the basal clade under parsimony principle. In our first analysis, we used 168 SARS-CoV-2 sequences (GISAID dataset till 2020/03/04) to identify the basal clade which contains 33 identical viral sequences from seven countries. To our surprise, in our second analysis with 367 SARS-CoV-2 sequences (GISAID dataset till 2020/03/17), the basal clade has 51 viral sequences, 18 more sequences added. The much larger NCBI dataset shows that this clade has expanded with 85 unique sequences by 2020/04/04. The expanding basal clade tells a chilling fact that the least mutated SARS-CoV-2 sequence was replicating and spreading for at least four months. It is known that coronaviruses have the RNA proofreading capability to ensure their genome replication fidelity. Interestingly, we found that the SARS-CoV-2 without its nonstructural proteins 13 to 16 (Nsp13-Nsp16) exhibits an unusually high mutation rate. Our result suggests that SARS-CoV-2 has an unprecedented RNA proofreading capability which can intactly preserve its genome even after a long period of transmission. Our selection analyses also indicate that the positive selection event enabling SARS-CoV-2 to cross species and adapt to human hosts might have been achieved before its outbreak.
- Published
- 2021
- Full Text
- View/download PDF