Skip to main content

AI in Cancer Research: Tumor Phylogenetics

BY CCIL Communications Team

Artificial intelligence is often employed in the field of cancer genomics, where bits of DNA sequencing data must be identified and further analyzed with statistical, evolutional, and probabilistic models. “Off-the-shelf” computing tools are useful for many cancer researchers, but Mohammed El-Kebir (IGOH), Cancer Center at Illinois (CCIL) scientist, is taking these AI applications a step further.

El-Kebir and lab students discussing research in a classroom.
El-Kebir and lab students discussing research in a classroom.

Although cancer was not the initial focus for El-Kebir’s research career, he found that computer science offered exciting opportunities to further our understanding of the evolutionary processes of tumor cells. Ultimately, El-Kebir began to devote his attention to these evolutions, and the phylogenetic trees that they create.

“Within a single tumor, you have intra-tumoral heterogeneity — different sets or types of mutations — that are intriguing. Evolution of species can be described in a single tree of life, but with cancer, you get a tree for each tumor,” El-Kebir said. “If you have the capability to code individual trees, you can start mining and identify subtypes — and ultimately, predict things like therapy response, taking precision medicine to the next level.”

These phylogenetic trees can reveal clones and clonal expansions, mutations from single cells to propagations of cells that share a lineage. A single tumor can gain multiple mutations and multiple clones, leading to intratumoral heterogeneity. This creates the risk of cancer therapies missing clones during treatment, leading to disease progression.

El-Kebir, also an assistant professor of computer science, develops software programs, such as PhySigs and MACHINA, that focus on phylogenetic trees and help researchers make inferences about the mutations on these trees.

PhySigs identifies shifts in mutational signatures, or patterns of mutations, in DNA sequencing data. These shifts contribute toward a final mutational portrait that can explain how or why a mutation occurred. This AI was able to identify a specific signature in a lung cancer patient’s mutational clones which indicated the presence of a mutation in a DNA repair gene.

MACHINA, on the other hand, tracks metastatic tumor spread, allowing researchers to infer the sequence of migration from anatomical locations by comparing clones from those sites. That is, MACHINA identifies which clones migrated — and where — revealing whether a metastasis was seeded from the primary tumor or another metastatic site.

Migrations like these also occur within other medical contexts. Cancer researchers have discovered co-migration of clones that seed metastases, which also happens in viral infections where multiple pathogens can be co-transmitted in a single event, and a patient is infected with multiple variants.

“Parsimony — or Occam’s razor — tells us it is likely the simplest explanation with the fewest number of events: it is much more likely that a clone migrated than it appeared twice, independently, in different locations,” El-Kebir said.

El-Kebir’s software is primarily used in research labs rather than in direct clinical settings. Precision medicine for cancer patients typically requires a gene panel, instead of whole genome sequencing which can take approximately 12 hours to generate.

“There’s a lot of data available, and it is only increasing — and so are the opportunities. But the interpretation of the data is lagging — the tools haven’t caught up yet,” El-Kebir said. “The bottleneck here isn’t the wait for better computers, but the sequencing technology. We will get longer reads and error rates will drop — I foresee this technology will continue to improve. Better data will lead to better methods and better understanding of the composition of individual tumors. We’re getting there.”

El-Kebir’s current focus is the development of  methods that enable the estimation of cancer phylogenies from single-cell sequencing data. Specifically, he is addressing is the integration of data obtained from the same tumor using distinct single-cell technologies. Another focus is the development of comprehensive evolutionary models for somatic mutations that occur at varying genomic scales. He does this in close collaboration with researchers at the Mayo Clinic.


Mohammed El-Kebir is an assistant professor of computer science at the University of Illinois Urbana-Champaign. He is also affiliated with the Cancer Center at Illinois (CCIL) and the National Center for Supercomputing Applications (NCSA). His lab’s research focuses on tumor phylogenetics in the context of intra-tumor heterogeneity. Click here to find out more about the El-Kebir lab’s research.

News Archive