NAF parsers
A Python library for NAF
Pynaf: Yet another python library for NAF
A Java Parser for NAF
Raw text to NAF
- A generic news website parser (under development): https://bitbucket.org/scraperwikids/fp7-go
NLP Modules that work with NAF
Tokenization
- ixa-pipe-tok:A multilingual rule-based tokenizer for English and Spanish compliant with Penn Treebank and Ancora Corpus tokenization. https://github.com/ixa-ehu/ixa-pipe-tok
- Stanford-based tokenizer: Sentence segmentation and tokenization for English as provided by Stanford CoreNLP.
Part-of-Speech tagging
- ixa-pipe-pos:English/Spanish POS tagging with Perceptron models (Collins 2002) as implemented by Apache OpenNLP using the WSJ and Ancora corpus respectively.
Lemmatization is dictionary-based. https://github.com/ixa-ehu/ixa-pipe-pos - Stanford-based POS-tagger: POS-tagging for English based on the Java implementation of the Stanford POS-tagger (Toutanova et al. 2003).
Parsing
- ixa-pipe-parse: English/Spanish Constituent Parsing with Maximum Entropy models (Ratnaparkhi 1999) as implemented by Apache OpenNLP using the Penn and Ancora Treebanks respectively. https://github.com/ixa-ehu/ixa-pipe-parse
- Stanford-based Parser: A probabilistic lexicalized dependency parser based on the Stanford statistical parser (Manning and Klein, 2003).
- MATE-based Parser: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).
- Alpino Parser: A version of the Alpino parser that uses NAF as input and output. https://github.com/cltl/dependency-parser-nl
Semantic Role Labeling
- MATE-based SRL: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).
Time Expression identification
- TimePro: a tool identifying English temporal expressions. It is trained on TempEval3 data (UzZaman et al., 2013).
Named Entity Recognition
- ixa-pipe-nerc: English/Spanish Named Entity Recognition with Perceptron models
(Collins 2002) as implemented by Apache OpenNLP on CoNLL datasets for NER.
https://github.com/ixa-ehu/ixa-pipe-nerc
Named Entity Disambiguation
- ixa-pipe-ned: A client to query the DBpedia Spotlight for Named Entity Disambiguation (Mendes et al., 2011). https://github.com/ixa-ehu/ixa-pipe-ned. This tool depends on the DBpedia Spotlight: https://github.com/dbpedia-spotlight/dbpedia-spotlight.
Word Sense Disambiguation
- UKB based Word Sense Disambiguation: a tool that applies graph-based word sense disambiguation. It is a collection of programs that uses the Personal PageRank on the Lexical Knowledge Base (LKB) to rank vertices on the LKB.
Coreference Resolution
- CorefGraph:: a python reimplementation of the coreference resolution tool proposed
by the Stanford NLP group (Lee et al., 2013) for English and Spanish. https://bitbucket.org/Josu/corefgraph
Event classification
- KYOTO event classifier: a tool that identifies whether events are a communication, cognition, or other. The classification is based on KYOTO-DOLCE. It makes use of:
- OntoTagger: https://github.com/cltl/OntoTagger.git
- NAFKybot: https://github.com/cltl/KafKybot.git
Factuality
- NewsReader Factuality classifier: a tool that determines the factuality of expressions: a Mallet (McCallum 2002) classifier trained on FactBank v1.0 (Saurí and Pustejovsky, 2009).
Opinion Mining
- VUA Opinion Miner: a tool that detects opinions in English and Dutch text and for each opinion extracts:
- The opinion expression
- The opinion holder
- The opinion target
The module uses the conditional random fields implementation provided by CRFsuite (http://www.chokkan.org/software/crfsuite/) and is trained on small manually annotated corpora.