Mihai Surdeanu, is a computer scientist at the University of Arizona. 

Cancer is triggered by highly complex protein signaling pathways, which drive the way cells communicate and interact with one another.

Although we have become increasingly better at understanding and controlling some of these pathways, many clinical tests of cancer drugs that aim to block certain known pathways do so fleetingly, followed by almost-inevitable relapses.

One likely reason for this failure is that cancer pathways interact and interweave into an intricate network. This network further evolves in response to drug treatments. In other words, these drugs fail because we take a reductionist approach to cancer, which focuses on snapshots of individual pathways, instead of a holistic one, which considers all pathways jointly as well as their dynamic changes.

The Defense Advanced Research Projects Agency created the Big Mechanism program to address this issue.

The program’s goal is to develop systems that “read” all published scientific articles, extract fragments of causal pathways, such as “protein A activates protein B,” and assemble them into a holistic cancer-signaling network that integrates knowledge discovered by many different research groups, which can then be used to suggest novel hypotheses and future therapeutic strategies.

This is a fundamental contribution for several reasons. First, because experimental cancer biology is time-consuming and expensive, all cancer-research groups focus on small subsets of this holistic cancer-signaling network.

This leads to imperfect drugs, as evidenced by clinical studies. Second, it is estimated that human curators who have attempted to integrate this knowledge currently cover less than 2 percent of the published literature and fall behind more and more every year due to a continuously increasing number of papers containing relevant research. Clearly, we need to automate this process.

In collaboration with colleagues at the UA, Carnegie Mellon University, the Information Sciences Institute, SRI and Memorial Sloan Kettering Cancer Center, my research group has constructed such an automated reading system in the past year.

Our system builds on top of a novel information extraction framework (think of it as a programming language to model natural language) with grammars that model the language used in biomedical literature.

In a recent evaluation, this system approached human performance (about two-thirds of extracted information was considered useful), but at much higher throughput — that is the speed with which the system can read an article (less than five seconds). I am optimistic that in the short term, machines will suggest novel, holistic cancer hypotheses, which, while potentially missed by individual researchers, become apparent when the entire published literature is explored.