What is it?
processors is the main public code repository of the Computational Language Understanding (CLU) Lab at University of Arizona.
This repository contains:
- A suite of natural language processors in the
org.clulab.processors
package. See the Processors section for details. - A rule-based event extraction (EE) framework called Odin (Open Domain INformer) in the
org.clulab.odin
package. See the Odin section for more details. - A multi-task learning framework for deep learning and sequence modeling called Metal, which is implemented on top of DyNet. This framework includes a simple domain-specific language (DSL) that allows you to ramp up sequence models very quickly without writing any Scala code. We use
Metal
to implement most of the components inCluProcessor
. See the Metal section for details. - Two full-fledged Rhetorical Structure Theory (RST) discourse parsers. The discourse parsers are transparently included in our natural language (NL) processors. The version in
CoreNLPProcessor
relies on constituent syntax, whereas the one inFastNLPProcessor
uses dependency syntax. They perform approximately the same, but the latter is much faster. - A machine learning (ML) package (
org.clulab.learning
), which includes implementations for common ML algorithms (e.g., Perceptron, Logistic Regression, Support Vector Machines, Random Forests) for both classification and ranking.
Authors
Mihai Surdeanu, Marco Valenzuela, Gustave Hahn-Powell, Peter Jansen, Daniel Fried, Dane Bell, Keith Alcock, and Tom Hicks.
License
Our code is licensed as follows:
main, odin
- Apache License Version 2.0. Please note that these subprojects do not interact with thecorenlp
subproject below.corenlp
- GPL Version 3 or higher, due to the dependency on Stanford’s CoreNLP. If you use onlyCluProcessor
, this dependency does not have to be included in your project.
Citations
If you use one of our discourse parsers, please cite this paper:
Mihai Surdeanu, Thomas Hicks, and Marco A. Valenzuela-Escarcega. Two Practical Rhetorical Structure Theory Parsers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies: Software Demonstrations (NAACL HLT), 2015. [pdf] [bib]
If you use Odin, our event extraction framework, please cite this paper:
Marco A. Valenzuela-Escarcega, Gustave Hahn-Powell, Thomas Hicks, and Mihai Surdeanu. A Domain-independent Rule-based Framework for Event Extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP), 2015. [pdf] [bib]
If you use CoreNLPProcessor
, please cite Stanford’s paper:
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014. [pdf] [bib]
If you use CluProcessor
, please cite this paper:
Mihai Surdeanu and Christopher D. Manning. Ensemble Models for Dependency Parsing: Cheap and Good? In Proceedings of the North American Chapter of the Association for Computational Linguistics Conference (NAACL-2010), 2010. [pdf] [bib]
If you use anything else in this package, please link to this page.