researchers viral mutation detectives
MUTATION DETECTIVES: Clockwise (from left): Emma Hodcroft (co-developer of Nextstrain, and a phylogenetics expert); Ivan Aksamentov (emergency doctor and software engineer), James Hadfield (phylogeneticist); Richard Neher (bio-physicist based in Basel); Daryl Domman (US infectious disease genomics professor); and Jeremy Kamil (associate professor of virology). Image Credit: Twitter


  • Disparate teams put their minds together to do the painstaking job of tracking and understanding the biology and evolution of SARS-CoV-2. 
  • Using open-source databases, this network of academics conduct a global genetic surveillance of SARS-CoV-2. 
  • Advanced sequencing techniques have tremendously helped scientists and health official track viral mutations and mount an appropriate response. 
  • But they have their limits, and they need help.

Dubai: As SARS-CoV-2 continues to wreak havoc, researchers do painstaking surveillance work to make sense of the thousands of viral genome sequences. The sequences come from different corners of the planet, at a rate of about 5,000 per day. Trying to make sense of the genetic data of a mutating virus is a bit like shooting at a constantly moving target. But their work is important and must continue: the coronavirus pandemic has left 2.6 million people dead (as of Tuesday, March 9, 2021) and 117 million infected.

A genome is the complete genetic material of an organism. It is the set of a living being’s genetic instructions. Each genome contains all of the information needed to build that organism and allow it to grow, develop and multiply.

Scientists are fighting back, with knowledge. So far, this diverse group of experts agree on one thing: the virus has shown mutations that confer it higher infectiousness.

The researchers do meticulous work, combining information from the so-called “wet biology” (swab tests) and sequence analyses. They update the data sets daily. The bulk of analyses of the genome sequences of SARS-CoV-2 rests on the initiative of academic researchers — phylogeneticists — who put together software and analytical tools to establish patterns, find essential answers and increase the scientific community’s understanding of the virus behind the pandemic, Nature explains.

The experts track and share real-time data to produce joined-up images of mutations as they emerge and spread. Such collaboration is invaluable (see tweets below). One result of their work: Systematic tracking of the SARS-CoV-2 viral genome. Here’s what we know about the people behind this effort, how they’re able to confirm the existence of new variants, what we know — and don't know:

Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. This area greatly enhances our understanding of SARS-CoV-2 outbreaks and variants.

Q: How is the genome data of SARS-CoV-2 known and shared?

Since January 2020, researchers around the world have posted huge numbers of SARS-CoV-2 genome sequences online, through scientific data-sharing platforms. There are different clusters or online databases involved in tracking its mutations around the world. 

VIRUS-MUTATION-(Read-Only) coronavirus generic
DYING CELL: A colorised scanning electron micrograph of a dying cell infected with the coronavirus, with virus particles in red. The bulk of analyses of the genome sequences of SARS-CoV-2 rests on the initiative of academic researchers — phylogeneticists — who put together software and analytical tools to establish patterns, according to Nature.

Armed with such data, they help track and share real-time open-source data to produce joined-up images of mutations as they emerge and spread. Such collaboration is invaluable (see tweets below). One result of their work: Systematic tracking of the SARS-CoV-2 viral genome, and their mutations.

Q: What are the most popular genome data sharing platforms?

GISAID, EBI, Nextstrain and GenBank are some of these public databases tracking the COVID-19 viral mutations.


GISAID, a scientific data sharing site hosted by Germany, is one of the more popular platforms. As of March 1, 2021, it already holds more than 610,000 viral genomes, the journal Nature reported. That number is estimated to well exceed 1 million by the end of the pandemic. Soumya Swaminathan, the chief scientist at the WHO, has called GISAID a “game-changer” in the pandemic.

WHO Chief Scientist Soumya Swaminathan
WHO Chief Scientist Soumya Swaminathan has described GISAID as a “game-changer” in the pandemic. Image Credit: (REUTERS)

On this platform, researchers upload their sequences, and also download other viral genome sequences from counterparts other countries. “There’s quite a good international effort where that sharing happens that helps us in this type of investigation,” Dr Jeop de Ligt, of the New Zealand’s Institute of Environmental and Science Research (ESR), told RNZ channel.


Another database is the European Bioinformatics Institute (EBI) near Cambridge, UK, which hosts its own large genome database that includes SARS-CoV-2 sequences.


is an open-source project involving academic researchers from Switzerland and the US, helping to coordinate analyses of the SARS-CoV-2 genome sequences based on data from GISAID.


GenBank is an open-access genetic sequence database maintained by the US National Institutes of Health (NIH). It is an annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence Database Collaboration.

Q: How do these sequenced genome samples help scientists, health officials and the world?

In theory, these genomes could help us better understand the transmission of the virus, through communities and between countries. That understanding would allow us to curb infections.

Q: When was the first complete genome sequence of SARS-CoV-2 released?

The complete genome sequence of SARS-CoV-2 was first released on the GenBank on January 5, 2020. Since then, there has been a rapid accumulation of SARS-CoV-2 genome sequences.

Q: Why is viral genome tracking important?

Early sharing of SARS-CoV-2’s genetic data, in January 2020, enabled the rapid development of diagnostics (ways to detect it). The knowledge of the virus’ RNA sequence was key in characterising it. This made possible an increased surveillance of SARS-CoV-2 viral genome, which enables the research community to identify and track new “mutations of concern”, to do the following:

  • Determine whether an observed mutation has changed the biology of the virus.
  • Identify mutations that might boost transmission.
  • Understand ways that help a virus evade immune responses, and use that information to craft new therapies, vaccines and policies.
  • Help scientists better understand how transmission occurs between individuals, and between countries.
  • Inform public health officials about ways to deal with the pandemic, based on scientific data set, in addition to simple contact-tracing.

Q: Who are uploading these viral genomes onto these databases?

Scientists and clinicians working in different parts of the world gather data, upload them and download data from others to compare the sequences and analyse them. The researchers include people from Argentina to Zimbabwe.

Q: What do they analyse?

Phylogeneticists analyse, compare and contrast genetic sequences of SARS-CoV-2 viral mutations. 

  • As viruses replicate, they make 'typos', or mutations.
  • When researchers line them up, they see the changes.
  • They use these change to relate them to each other.

Q: How many SARS-CoV-2 genome sequences come in each day?

More than 5,000 (and rising) sequences, according to a Nature report.

Q: Is it normal for viruses to mutate?

Yes. Mathematically, the number of possible genetic mutations is greater than all the atoms in the visible universe, Vincent Racaniello, a professor of microbiology and immunology at Columbia University told Live Science. "A good fraction of the genome can be replaced."

Q: Is there a limit to how much it can mutate?

Yes, there may be limit, but that limit is currently unknown. It's possible the virus maymutate and still make people sick — or a dominant strain may arise that simply reduces it to a common cold, instead of a deadly pneumonia-like disease. Scientists say they can only track the mutations, but not predict all of the possible mutations the virus could undergo,

Q: How many SARS-CoV-2 mutations of concern had been found by scientists?

There are at least five emerging coronavirus variants with the “most worrying mutations,” with genetic changes that can make them more contagious and “evasive”, reports Scientific American.

Variants of Concern
Scientists are constantly updating this list, based on new information gleaned from analyses of large sets of genetic sequences of the SARS-CoV-2 virus.
Mutation is a change in a DNA sequence. It is usually the result of a phenomenon in which nucleotide sequences found within a DNA of any life form are subjected to change. This then leads to a modification in an organism’s genetic makeup. Depending on the mutation, it can prove harmless, helpful, or even hurtful to the organism. SARS-CoV-2, like any virus, it has been mutating since it was first sequenced in early January 2020.

There are two kinds of mutations:

* Somatic mutations:


They take place in non-reproductive cells. Certain other mutations can greatly impact the life and function of an organism. For example, somatic mutations that affect cell division (particularly those that allow cells to divide uncontrollably) are the basis for many forms of cancer.

* Germ-line mutations:


They occur in gametes (an organism's reproductive cells) or in cells that eventually produce gametes. In contrast with somatic mutations, germ-line mutations are passed on to an organism's progeny. As a result, future generations of organisms will carry the mutation in all of their cells (both somatic and germ-line).

Q: How are new SARS-CoV-2 mutations identified?

They are identified by analysing thousands of samples taken from patients. When a patient tests positive, and a mutation is suspected, researchers use the new sample to run background observations of their genetic code, against an existing database of known mutants. If it's a new strain, it's plotted as a new "leaf" in a "tree" that tracks mutations.

Q: Mutations vs variants: What's the difference?

In the process of duplicating the SARS-CoV-2 viral RNA within an infected human cell, "errors" may occur. This results in viruses that are similar, but not exact, copies of the original. These errors in the viral RNA are called "mutations". Viruses with these mutations are called "variants". Variants could differ by a single or multiple mutations.

cryogenic electron microscope image of a SARS-CoV-2 spike protein side view, the S1 section of the spike is shown in green and the S2 portion is shown in purple.
In this cryogenic electron microscope image of a SARS-CoV-2 spike protein side view, the S1 section of the spike is shown in green and the S2 portion is shown in purple.

Q: How are samples obtained?

Samples, usually taken from nasal swabs, are subjected to polymerase chain reaction (RT-PCR) test. Reverse transcriptase and DNA polymerase enzymes are added to the sample. Multiple copies of any viral RNA present in the sample are made. Primers and probes are deployed aimed to target specific segments of the virus’s genome that are unlikely to change over time. This “chain reaction” generates enough copies — allowing for even a small presence of the virus to be detected when the sample is tested. These primers and probes then attach themselves to specific sequences in the virus’s genetic code — the signals are captured, which confirm whether a sample is positive or negative.

Q: Why is the use of PCR important in tracking mutations?

PCR tests are performed to actually detect genetic material specific to SARS-CoV-2 or any organism. As such, the detection of new strains can happen almost in real-time. Moreover, the test is highly accurate (compared to other tests) and picks up any virus strains.

Q: What is genome sequencing?

Sequencing DNA means determining the order of the four chemical building blocks — "bases" — that make up the DNA molecule. The sequence tells scientists the kind of genetic information that is carried in a particular DNA segment.


Humans have between 20,000 genes to 25,000 genes, according to the Human Genome Project.

Q: What is a gene?

A gene is a segment of DNA, and contains the instructions for the production of biological molecules — usually proteins. The DNA contains the instructions found in the genome of any living thing. Within DNA is a unique chemical code that guides growth and development.

Dr Emma Hodcroft.
Dr Emma Hodcroft explains her team's work. The virus behind the ongoing pandemic continues its deadly run, with 117 million infections globally, and more than 2.6 million deaths (as of Tuesday, March 9, 2021). Evidence has also emerged of mutations that lead to fast-spreading variants of the virus. Image Credit: Screengrab

Q: What is the likelihood of an emerging COVID-19 variant evading testing?

The chance of an emergent COVID-19 variant evading testing is highly unlikely. This is due to the unprecedented ability to quickly compare genetic strains of the virus through shared databases, which highlights utility of scientific collaboration on a global scale.

2 %

Our genes only account for about 2% of all our genetic information. Scientists don’t fully understand yet the exact function of the other 98%.

Q: How many genome sequences of SARS-CoV-2 had been conducted and shared?

More than 610,000, according to Nature (March 1, 2021). Laboratories around the world are sequencing more SARS-CoV-2 samples; the number is expected to exceed 1 million by the end of the pandemic.

We must move beyond the limitations of existing tools and improve processes, so that they are fit to handle a pandemic.

- Dr Emma Hodcroft, co-developer of Nextstrain, molecular epidemiologist at University of Basel

Q: What value does viral genome mapping provide?

In theory, these genomes could help the scientific community better understand the spread of the virus or any pathogen (disease-causing organism, bacterium, virus) across the globe.

The branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.

The bulk of analyses of the genome sequences of SARS-CoV-2 rests on the initiative of academic researchers — phylogeneticists — who put together software and analytical tools to establish patterns, find essential answers and increase the scientific community’s understanding of the virus behind the current pandemic, according to Nature.

Q: How effective are these analyses being made by phylogeneticists?

Pretty effective and pretty quick. For example, less than two hours after the spread of a new variant (now called 501Y.V1, or B.1.1.7, first detected in the UK) was announced by the UK health minister in December 2020, Nextstrain researchers had provided context for its key mutations (via Twitter), and showed its progression in the UK and across Europe in the months prior. The Twitter thread became a key source of information on the new variant. Over the Christmas 2020 break, Nextstrain researchers crunched further sequences and briefed journalists.

Q: What accounts for the speed of genome sequencing?

Sequencing, combined with data from epidemiologists (taken from hospitals and PCR tests), provide near real-time knowledge of the emergence of new variants and their biology, explained Nature.

Genetic sequencing costs
Genetic sequencing costs have come down significantly.

* In mid-2015, the cost to generate a high-quality “draft" whole human genome sequence was just above $4,000; by late in 2015, that figure had fallen below $1,500.

* In 2018, the cost to sequence your entire genome has gone further down to $200, according to Wired.

* PCR tests, which accurately test viral signature, have come down in price and has become routine.

Q: How do countries fare in terms of COVID-19 genome sequencing?

With SARS-CoV-2, many high-income countries (such as Iceland, Luxembourg, and Japan) have sequenced the most viral genomes per 1,000 cases. Many countries, especially in Africa, have no sequencing data at all. However, Gambia, Equatorial Guinea, and Sierra Leone have a higher rate of sequencing than France, Italy, or the USA, according to GISAID.

Q: Is it necessary to sequence every COVID-19 patient’s viral genome?

No. But there should be a sufficient level to detect and track mutations and their effects — both for COVID-19 and for future emerging and re-emerging infections, say scientists.

Q: What are the concerns of phylogeneticists?

Phylogeneticists had been trained to use certain tools to track transmission, flag key mutations, calculate metrics and inform public health officials. But they’re also humans, subject to certain limitations. Some of them are buckling under pressure due to the technical challenges they face in automating their work of keeping track of on-going viral mutations of SARS-CoV-2. They bare their angsts, discoveries, challenges — and their minds — on Twitter, as well as on journals such as Nature, or Lancet.

Q: Why are the naming conventions of the new coronavirus strains problematic?

It’s a result of disparate systems used. Sequences are being made available on several different databases, websites, and platforms. At present there is an absence of standardised nomenclature for variants, which contributes to a lack of clarity, according to experts.

Q: What sort of challenges do the researchers encounter?

Sequencing viral genomes is important; what’s more important, however, is to have enough data to help researchers understand the effects of mutations on the virus's biology and cross-refer it with clinical data. Researchers — especially “phylogeneticists” — also have their limits. For example, Nextstrain was previously used to track influenza and Ebola outbreaks, through small updates every week or month. Now, researchers need to update their analyses daily.

This presents enormous challenges for people running the system — those who run data analytics, bioinformatics and viral evolution research — whose tools are now being stretched to their utmost limit when they are most needed.

“We must move beyond the limitations of existing tools and improve processes, so that they are fit to handle a pandemic,” wrote Dr Emma Hodcroft, co-developer of Nextstrain.

Q: What is the way forward in global genomic surveillance?

There's hope that vaccines would help resolve this pandemic, but some experts now say this may not be necessarily true. Real-time global genomic surveillance of pathogens is a key weapon in the world's arsenal against the outbreak. By its very nature, effective viral genomic surveillance needs to be a global concern, the Lancet asserts. It must be widely adopted, powered by seamless open data sharing. 

There's a lot more research that's going to be needed to know how the dynamics of this virus will continue and change. A lot more work needs to be done for a scientifically-informed response that would put the pandemic, the mask mandates, social distancing, infections and untimely deaths behind us. In these COVID-19 times, "nobody is safe until everyone is safe," says the UN Deveopment Programme.