Inside every cell that makes up a diminutive fruit fly is a vast, dynamic network of information—the genome whose ~15,000 genes allow that cell to function. In a study recently published as a Breakthrough Article in Nucleic Acids Research (DOI: 10.1093/nar/gkv195), computer scientists and molecular biologists demonstrated the utility of a novel approach to deciphering how networks of genes are regulated.
University of Illinois computer scientist Saurabh Sinha (pictured, right) and colleagues, including Scot Wolfe and Michael Brodsky at the University of Massachusetts Medical School, were led to the current study by their fascination with how interactions between DNA and a special category of cellular proteins, called “transcription factors,” help control when and where genes are expressed. University of Illinois computer science graduate student Charles Blatti played a major role in the study, which was funded by the NIH, NSF, and a Cohen Graduate Fellowship awarded to Blatti.
Transcription factors grab onto regions of DNA near sequences that code for genes. Their participation in the molecular complexes responsible for gene expression can either promote or repress expression of individual genes. Considered as a group, these proteins are a little bit like an operating system of the cell—they help orchestrate the genetic “programs” that cells need to run.
Each type of transcription factor recognizes a unique DNA sequence, called a motif, that acts like a molecular password or barcode. By scanning through genome sequences to look for these motifs and the genes that are nearby, researchers like Sinha and his colleagues are working to gain an understanding of which transcription factors control which genes, and therefore which biological functions.
Looking at motifs alone, however, gives limited and potentially misleading information about where transcription factors are actually present and performing. Because motifs are short, they may occur in many places in the genome simply due to chance; many biological factors, including the tissue type and developmental stage of a cell, will change the way DNA is packaged in different areas of the genome, which helps determine whether a given transcription factor can physically access and bind to each of its many motifs or not.
“Motifs alone are of limited use in that the motif profile of the genome is static,” said Sinha, who is also a faculty member in the Gene Networks in Neural and Developmental Plasticity research theme at the Carl R. Woese Institute for Genomic Biology. “How does the transcription factor’s binding profile change from one cell to another, from one time point to another? . . . It's the accessibility that is changing.”
Researchers have relied on an expensive and labor-intensive method, chromatin immunoprecipitation with DNA sequencing (ChIP-seq), to determine where in the genome a transcription factor is binding. This method must be repeated for each transcription factor a researcher investigates. Sinha and his collaborators realized that they might be able to leverage their growing knowledge of how transcription factors interact with motifs to develop a shortcut.
They did this by first using just one laboratory experiment to determine how DNA was packaged throughout the genome in a given set of biological conditions. Then, they used this information as a filter, guessing that each transcription factor would bind to only those of its motifs that were in accessible regions of DNA given those conditions. They found that with this combination of laboratory and computational work, they could predict the observed DNA-binding behavior of transcription factors in previous experiments.
“In order to reconstruct a regulatory network in a new system, you don't necessarily have to do a whole lot of assays in the right cell types,” Sinha said. “If you instead do an accessibility assay in those cell types and then overlay the motif information on top of it . . . these two together approximate the same information very well.”
In addition to making this type of work easier and more affordable for laboratories with limited resources, the publication demonstrated the value of the new approach for exploring the function of enhancers, regions of DNA where several transcription factors bind close together. Access to high-quality information about the activity of many different transcription factors makes it easier to infer relationships between enhancers, transcription factors, and biological functions.