Analyzing Databases of Genomic Information

At The Simons Center for Systems Biology, headed by School of Natural Sciences Professor Arnold Levine, a group of theoretical physicists, mathematicians, computer scientists, and biologists has been analyzing databases of genomic information in the context of complex diseases, such as cancers and HIV, in order to improve early diagnosis through identification of high-risk populations, or to improve outcome with targeted drug therapy. The group has developed unique algorithms and approaches to study disease genes that are now in the literature.

For example:

* Research done jointly at The Simons Center and the Cancer Institute of New Jersey over the past three years has helped to identify genomic differences that appear to predispose premenopausal women to increased risk of breast, lung, and colon cancers. Women who need to be screened early for cancers might be reliably identified by a genotyping test that could be administered in a doctor’s office.

* Members at the Simons Center have discerned different nucleotide sequence motifs in the HIV genome and its human host, which could lead to a new approach to developing a vaccine for HIV.

* It has been possible to identify differences in the mutational preferences that occur when influenza viruses replicate in birds or in humans. Because the incubation of influenza infections in birds can lead to worldwide pandemics when these viruses enter into the human population, these distinctions provide a clue to the strains that may jump species and to what happens when this occurs. In addition, the segregation of the influenza chromosomes into progeny viruses can also be shown to be a nonrandom process––a prediction that has been confirmed experimentally. These types of nonrandom events may well permit the prediction of next year’s epidemics and the production of vaccines in advance of an epidemic.

* The herpes viruses are unusual because they initiate a primary infection, then reside in the body in a latent or dormant state until, in response to stress signals, they reactivate and cause disease. Employing novel algorithms that match a viral or cellular gene product to a gene it regulates, Members at The Simons Center have been able to explain how these viruses enter latency in the body––predictions that have been verified experimentally with one of the herpes viruses.

* Polymorphisms in several genes that are known to play a role in the origins of cancers in humans have been identified and shown to be associated with the early onset of cancers in individuals. The molecular basis of how these variations alter the activity of a gene and create a predisposition to cancerous growth is now understood experimentally.

* Novel approaches to identify sequence variations that have functional consequences are now available and being tested. Information theory has been employed to analyze the kinds of variation observed in human genes in a defined population. It has become possible to identify a particular sequence or pattern of changes that have more recently entered the population but represent a larger than expected share of the polymorphic forms observed in this population. This has been observed with two genes in humans that have subsequently been shown to play a role in reproduction of embryos in females. Here, again, the theory has led experimentalists to test and confirm ideas that help to explain the predictions.