Skip to main content

Researchers design AI method to predict metabolomic profiles of microbial communities

BY Shelby Lawson

Human bodies contain trillions of microbes, so much that the number of microbes rival the number of human cells in a body. These microbes help shape many of our biological functions. For example, microbes in the gut break down food into small molecules called metabolites, many of which are important for human health. Measuring species composition of the microbial community using metagenomics has become a quick and automated process, while measuring the concentrations of metabolites produced by those microbes, a process called metabolomics, is much more difficult and expensive. However, in a new study recently published in Nature Machine Intelligence, researchers have developed a machine learning algorithm called mNODE, which can predict metabolite concentrations based on the species composition of the microbial community.

Researchers design AI method to predict metabolomic profiles of microbial communities
From left: Tong Wang, lead author and postdoctoral researcher in the Liu lab; Sergei Maslov, Professor of Bioengineering and Physics; and Yang-Yu Liu (CAIM), Associate Professor of Medicine, Harvard Medical School

The study was conducted by researchers from IGB’s Center for Artificial Intelligence and Modeling theme, including Yang-Yu Liu (CAIM), an associate professor of Medicine at Harvard Medical School, lead author Tong Wang, a postdoctoral researcher in Liu’s lab, as well Sergei Maslov (CAIM leader/CABBI), a professor of Bioengineering and Physics at the University of Illinois Urbana-Champaign.   

“Metabolomics requires large and expensive equipment, and it’s not very automated,” said Wang. “It also gives limited information and coverage on the metabolites that actually exist in the community that you’re measuring.”

“Next-generation sequencing made genomics about 100,000 times cheaper than it used to be, and led to an increase in genomic sequencing being done,” said Maslov. “But nothing like this happened with measurement of metabolites, so it remains relatively expensive and labor intensive to measure.”  

Wang started as a graduate student in Maslov’s lab, where he worked on mechanistic models for predicting metabolite concentrations from metagenomic data, with some accuracy. When he became a postdoctoral researcher in Liu’s lab, both labs collaborated to create a machine learning approach to tackle the problem, taking inspiration from another of Liu’s projects which utilized a similar machine learning method. The researchers named the new method mNODE, which stands for Metabolomic profile predictor using Neural Ordinary Differential Equations.

“In our earlier mechanistic models, we tried to the best of our ability to model all of the processes of what is produced and who produces what,” Maslov explained. “But as you can imagine, those processes are really complicated, and you need to know hundreds of parameters for each microbial species. But the new machine learning method really comes to the rescue here because it can bypass some of those limitations. And if you have enough data, you can actually predict metabolite concentrations without knowing all of those nitty gritty details.”

“Suppose you don’t have the budget to run the expensive metabolomic tests,” explained Liu. “You could instead do very cheap metagenomics sequencing on your microbiome sample, and then use our methods to predict the metabolomic profile. That is the simplest application of our methods. Our long-term goal is to achieve personalized nutrition using our artificial intelligence and machine learning.”

First, mNODE was systematically validated using synthetic data generated by models. These models contained ecological data with known interactions between microbes and metabolites. Then, it was applied to real data from various environments. The microbe-metabolite interactions inferred from mNODE were confirmed by comparing them to the results from metabolomics experiments and genomic evidence.

“We started with synthetic data because you know exactly where the ground truths are,” said Liu. “Once you’ve finished the validation, you can then apply it to real data. And though you won’t have the complete ground truth, you can compare it to metabolomics tests and genomic information in the literature to validate it.”

The researchers say mNODE can not only use microbial composition to predict metabolomic profiles, but it can also incorporate some dietary information to enhance the accuracy of its predictions. They said that although this needs more development, it could be a great tool towards personalized nutrition in healthcare.

“If you can integrate this dietary information pretty well, and you know what an ideal metabolic panel looks like, you can then optimize for each individual,” Wang said. “So maybe in the future doctors can give you a dietary recommendation based on your current profile and tell you what is the most ideal combination of different foods to reach a healthy metabolic profile.”

“Some metabolites are known as ‘healthy’ metabolites,” Maslov added. “So ideally, you’d want to figure out exactly how to increase the concentration of those in the gut and ultimately the bloodstream of individuals as well. That's the billion-dollar question.”

The team described the AI’s success as a testament to the modeling power and innovation of the CAIM, the new theme at the IGB. Maslov explained that projects in the theme aim to have three elements: data generation through experiments, mechanistic models to understand the physical processes behind that data, and machine learning and AI to make predictions based on new data.  

“This is one of those early success story projects that exemplifies the type of modern science projects that the Center for Artificial Intelligence and Modeling was created for,” Maslov declared. “We believe all three elements are necessary for really successful, modern, and interpretable projects.”

Liu added that in the future they hope to design AI for other purposes too, and ones that are not just purely data-driven, but also contain some domain knowledge within biology or physics, which would strengthen the methods.

The published study was supported by the National Institutes of Health and can be found at here:

News Archive