About
Reach stands for Reading and Assembling Contextual and Holistic Mechanisms from Text. In plain English, Reach is an information extraction system for the biomedical domain, which aims to read scientific literature and extract cancer signaling pathways. Reach implements a fairly complete extraction pipeline, including: recognition of biochemical entities (proteins, chemicals, etc.), grounding them to known knowledge bases such as Uniprot, extraction of BioPAX-like interactions, e.g., phosphorylation, complex assembly, positive/negative regulations, and coreference resolution, for both entities and interactions.
Reach is developed using Odin, our open-domain information extraction framework, which is released within our
processors
repository.
Authors
Reach was created by the following members of the CLU lab at the University of Arizona:
Citations
If you use Reach, please cite this paper:
@inproceedings{Valenzuela+:2015aa,
author = {Valenzuela-Esc\'{a}rcega, Marco A. and Gustave Hahn-Powell and Thomas Hicks and Mihai Surdeanu},
title = {A Domain-independent Rule-based Framework for Event Extraction},
organization = {ACL-IJCNLP 2015},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP)},
url = {http://www.aclweb.org/anthology/P/P15/P15-4022.pdf},
year = {2015},
pages = {127--132},
Note = {Paper available at \url{http://www.aclweb.org/anthology/P/P15/P15-4022.pdf}},
}
More publications from the Reach project are available here.
Reach datasets
We have generated multiple datasets by reading publications from the open-access PubMed subset using Reach. All datasets are freely available here.
Reach web services
We have developed a series of web services on top of the Reach library. All are freely available here.
Funding
The development of Reach was funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.
Licensing
All our own code is licensed under Apache License Version 2.0. However, some of the libraries used here, most notably CoreNLP, are GPL v2. If BioNLPProcessor
is not removed from this package, technically our whole code becomes GPL v2 since `BioNLPProcessor` builds on Stanford's CoreNLP
functionality. Soon, we will split the code into multiple components, so licensing becomes less ambiguous.
Modifying the code
Reach builds upon our Odin event extraction framework. If you want to modify event and entity grammars, please refer to Odin's Wiki page for details. The Odin manual is the best source for details on the rule language and the Odin API.