By: Aaron Seidlitz
A workgroup including Illinois Computer Science professors Mohammed El-Kebir (IGOH) and Jian Peng (CABBI), as co-principal investigators, produced a preprint research paper serving as the “first comprehensive analysis at a large scale that attempts to identify genomic signatures of SARS-CoV-2 strains that occur within and across individual hosts.”
They collaborated with CS PhD student Yunan Luo and incoming CS MS student Palash Sashittal.
Using genomic signature deconvolution, the group identified multiple strains present in the global population and found evidence of the coexistence of distinct strains within infected patients.
By publishing these answers, the workgroup hopes that more questions arise from the biological and medical communities. More answers will be sure to follow, and the knowledge produced will push the effort to help during this time of pandemic forward.
“If you have multiple strains within one host you can start thinking about whether or not there is a competition happening there,” El-Kebir said. “Do they behave differently in different organs? When spread happens, do they compete against each other? Or are they co-transmitted?"
“Those are the kind of questions that will open up because of these findings.”
The workgroup produced the preprint based on a $100,000 RAPID grant received from the National Science Foundation (NSF).
El-Kebir said dedicating the bulk of his research activity to cancer genomics and computational biology put him in a unique position to help during the COVID-19 pandemic.
His initial studies produced algorithms that analyzed the way tumors evolve. El-Kebir said that one of the most important aspects of cancer research is metastasis, or the notion that tumor cells leave the primary tumor and migrate to other locations.
He then realized that study could naturally expand to disease outbreak, because of how similar it is in nature to cancer migration. After studying this during the Ebola outbreak a few years ago, it became obvious to him now that these algorithms can also help others understand the behavior behind the virus causing COVID-19.
“Essentially, when the pandemic started, we had access to the tools we needed,” El-Kebir said. “After COVID-19 spread, there was suddenly a lot of data shared that we could investigate. When that was available, we started looking at it and began wondering if we could apply our ideas to understand how the spread of COVID-19 proceeded.”
The researchers first set aside two preconceived notions.
“There is the thought that when transmission happens, you only get one strain of this virus transmitted. And then there’s this notion that once infection occurs, a person cannot get infected again,” El-Kebir said. “I began this project by questioning whether those assumptions are actually occurring. I also wondered what we would find out once we removed those assumptions from the process.”
The research paper acknowledged that “current phylogenetic and phylodynamic approaches typically use consensus sequences, essentially assuming the presence of a single viral strain per host.”
The workgroup then reviewed 621 bulk RNA sequencing samples and 7,540 consensus sequences from COVID-19 patients.
And their curiosity was correct; they identified multiple strains of the virus, SARS-CoV-2. The paper that produced this result is going through the peer review process now and is available on bioRxiv.
In time, their work will subsequently assess whether coexisting strains are the result of multiple infection points. El-Kebir said they would like to access the Sequence Read Archive (SRA) for about 5,000 more samples from COVID-19 patients.
Their ultimate, although difficult, priority would be to cross-check their findings with disease outcomes. That would take a clinical partner, though, to access this data.
Finally, they are trying to quantify the severity of the identified viral strains through a protein functional analysis of their mutations – which is Peng’s expertise.
By: Aaron Seidlitz