Statistician using numbers to paint complete biological pictures
Cell biology experiments that used to take months now take one day, the number of labs doing bench science has exploded, as have the number of research techniques available and, to top it all off, the Internet makes sharing data as easy as clicking a mouse. Gene sequencing alone has generated so much DNA data that making sense of it all is a Herculean task. So what are we to do with this embarrassment of riches?
This is where scientists like Ping Ma, assistant professor of statistics, come in. Ma, who arrived at the University of Illinois in 2005, earned both his master's degree and doctorate in statistics from Purdue and then went to Harvard University for a postdoctoral fellowship at the Bauer Center for Genomics Research. In addition to being on the faculty of the IGB, he is also a fellow at the Center for Advanced Studies (CAS).
Though Ma's expertise is in statistics, he has always been very interested in computational biology. Although computational biology and bioinformatics are often used interchangeably, it is more precise to say that bioinformatics is a tool that computational biologists use to understand and interpret data. "My philosophy always has been, if you are doing statistics, you should be interdisciplinary," says Ma. "I'm interested in learning the science, not just analyzing the data."
Because of this philosophy, Ma has worked with scientists in a wide range of disciplines, from biochemistry and biology to geophysics and astronomy.
In many fields, says Ma, researchers have only had access to "very naive statistics" up until now, in part because of a lack of communication between fields. Thanks in part to Ma's enthusiasm and dedication, those lines of communication are opening up, especially in the United States.
He recently received a 4-year NSF grant ($590,000, which "is huge" for a statistician, he says) for a project examining the "histone code hypothesis," with Neil Kelleher, associate professor of chemistry and a member of the IGB faculty. The histone code hypothesis suggests that alterations of the histone within the nucleosome — the histone/DNA package in the nucleus of eukaryotes — may drive key cellular processes.
If we can understand the countless processes by which each histone is modified we can better understand and potentially treat diseases that are caused by cellular disregulation. However, until now it has been difficult to put together all the data collected from techniques as wide ranging as mass spectrometry (Kelleher's area of expertise), DNA sequencing, microarray, and chIP-on-chip analysis.
Ma compares each of these methods to an individual worker in a factory — looking at the just one worker won't demonstrate how that he is contributing to the final product. Ma is using statistical principles and computational methods to construct a model that will reveal the overall picture of the entire manufacturing process.
"Biological data sometimes are like the three blind men feeling the elephant and trying to describe it from only their perspective," Ma says. "Instead of just describing the tail and then the skin, by integrating all these data we can describe the whole elephant through statistical principles."
Ma's expertise has also helped shed light on questions in fields other than biology. For example, Ma is working with Xiaodong Song, professor of geology at Illinois, who has postulated that the inner core of the earth rotates faster than its surface. So far Song has used a model for this work that confirms the theory, but he had to base the model on many assumptions. Using Ma's skills, the researchers can "wipe out all prior knowledge" and just "let the data speak," says Ma.
In another project, Ma is collaborating with a geophysicist at MIT and a mathematician at Purdue to analyze data from the earth's core to generate a high-resolution picture of the core-mantle boundary. In this case, says Ma, there is only one method of measurement, seismic waves, since it is physically impossible for us to reach the core. By combining Ma's expertise with that of his colleagues, the scientists hope to shed light on this surprisingly complex interface.
Little did Ma realize when he first began the project just how large and complex the core-mantle boundary question was. Instead of being an "interesting side project, it has become one of the most successful collaborations I have had," says Ma.
Ma has also received another 4-year NSF grant for this work and will use his time as a CAS fellow to continue the project.
Ma is also organizing the second annual Midwest Symposium on Computational Biology and Bioinformatics entitled, "Genomics and Statistical Methodologies for Solving Biological Problems." Learn more about the conference at www.igb.illinois.edu/bioinformatics.