A team of scientists from the USA and Germany has recently studied the evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in a representative set of sequences from the USA collected between 2020 and 2021. The findings reveal that the viral genome has accumulated multiple mutations over time with only occasional loss of mutation. The main driving forces behind such genetic variations include widespread infection and superspreader events. The study is currently available on the bioRxiv* preprint server.

Background
Within one year of its emergence, SARS-CoV-2, the causative pathogen of coronavirus disease 2019 (COVID-19), has infected 110 million people and claimed 2.4 million lives globally. SARS-CoV-2 is a single-stranded, positive-sense, enveloped virus of the Coronaviridae family. Among various proteins present on the viral envelope, the spike glycoprotein is the most immunogenic component because of its direct involvement in the viral recognition and entry processes. Any alteration in the amino acid sequences of the viral open reading frames (ORFs), which encode essential viral proteins, can lead to the development of new viral variants. Compared to other RNA viruses, mutations occur less frequently in SARS-CoV-2 because of the presence of 3’-5’ exoribonuclease proofreading ability. However, evidence suggests that most of the single nucleotide substitutions observed in SARS-CoV-2 are likely caused by RNA editing deaminases, which generally target adenine and cytosine bases to cause transition mutations. In addition, several recombination mutations via template strand switching have been documented in SARS-CoV-2.
In the current study, the scientists have investigated the incidence of SARS-CoV-2 mutations that appeared during 2020 in the United States and derived a set of mutational signatures representing distinct viral variants. Based on the mutational signatures, they have aimed to identify new variants or new mutations in previous variants that have been introduced from different regions worldwide. They have studied a representative set of sequences that cover the entire SARS-CoV-2 genome in the United States.
Important observations
For the analysis, the scientists collected more than 8000 full-length SARS-CoV-2 sequences from COVID-19 patients between January 2020 and January 2021. They identified multiple distinct SARS-CoV-2 variants, including the original Wuhan strain and its subvariants carrying minor mutations; and two varieties of the European strain with D614G mutation. The European strain rapidly acquired multiple mutations that ultimately resulted in a new homegrown dominant variant, s48. They observed that instead of a recombination event, these mutations actually resulted from the acquisition of single nucleotide substitutions that rapidly increased in frequency due to superspreader events.
Importantly, they observed that the major USA variants accumulated an increasing number of mutations over time, indicating the fact that the emergence of novel mutations can increase with uncontrolled viral transmission and that the newly emerged variants may influence the effectiveness of therapeutics antibodies and vaccines. Specifically, they observed that during 2020, more than 20 amino acid substitution mutations occurred in the spike protein, and many of these mutations are still remaining in the population at a low frequency. This indicates that these substitution mutations are increasingly accumulating over time with a minimal loss from the population through genetic drift.