Skip to main content

New Tool to Identify Genes Associated with Coronavirus

BY Aaron Seidlitz

Even though it’s been more than two years and much has changed about the COVID-19 pandemic, researchers like Illinois Computer Science professor Mohammed El-Kebir (IGOH) continue to investigate the virus to ensure the medical and scientific community are better prepared to respond if something similar occurs in the future.

A recent paper from El-Kebir, investigates transcription regulatory sequences. The papers states TRSs play a critical role in discontinuous transcription in coronaviruses.

Ultimately, the work introduces “two problems collectively aimed at identifying these regulatory sequences as well as their associated genes.”

Mohammed El-Kebir
Mohammed El-Kebir

“I think the true impact of this work is that we have developed a new tool for our disposal if the need arises and there’s another pandemic – if it’s going to be another Coronavirus,” El-Kebir said. “The work allows scientists to quickly identify these TRS sites as well as the genes of future, yet undiscovered, coronaviruses. This information essentially allows us to classify the virus and accurately place it into the phylogeny of coronaviruses.

The workgroup included one of El-Kebir’s graduate students, Palash Sashittal, who conducted the bulk of the technical work with Chuanyi Zhang – a graduate student in the Department of Electrical & Computer Engineering. Two CS undergraduate students, Ayesha Kazi, and Michael Xiang, and one ECE undergraduate student, Yichi Zhang, contributed to the effort by creating a web interface to make the findings accessible.

Their research began in Fall of 2020, with Zhang and Sashittal’s work in one of El-Kebir’s graduate courses, Introduction to Bioinformatics.

As the paper describes, the group first focused on the “TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations.” Their solution, which the group calls CORSID-A, is an algorithm that “solves this problem to optimality in polynomial time.” The group also states that CORSID-A “outperforms existing motif-based methods in identifying TRS sites in coronaviruses.”

Second, the paper demonstrates “for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome.”

As a computer scientist whose primary research focus is on combinatorial optimization algorithms in computational biology, El-Kebir’s work was well situated to help in response to the COVID-19 pandemic.

What amazed El-Kebir was the volume of work like his and the way the scientific community came together to produce unprecedented results so quickly. With assistance from those in the computational biology field, for example, he noted that the COVID-19 vaccine was created from nothing. And the work is not stopping there; scientists are also constantly monitoring to see if the vaccine needs to be modified for new variants.

“It’s a pretty amazing time to be a part of the scientific community right now,” El-Kebir said. “Projects like these and the way we’re sharing them through faster-than-ever information flow is remarkable. I think it’s impressive to see how quickly we came to understand the problem and to see how quickly we responded. There has been a tremendous exchange of ideas in all forums that has been inspiring."

The study ““Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses” was published in Molecular Biology and Evolution and can be found at The study was funded by the National Science Foundation.

News Archive