Software

NAF parsers

Raw text to NAF

A generic news website parser (under development): https://bitbucket.org/scraperwikids/fp7-go

NLP Modules that work with NAF

Tokenization

ixa-pipe-tok:A multilingual rule-based tokenizer for English and Spanish compliant with Penn Treebank and Ancora Corpus tokenization. https://github.com/ixa-ehu/ixa-pipe-tok
Stanford-based tokenizer: Sentence segmentation and tokenization for English as provided by Stanford CoreNLP.

Part-of-Speech tagging

ixa-pipe-pos:English/Spanish POS tagging with Perceptron models (Collins 2002) as implemented by Apache OpenNLP using the WSJ and Ancora corpus respectively.
Lemmatization is dictionary-based. https://github.com/ixa-ehu/ixa-pipe-pos
Stanford-based POS-tagger: POS-tagging for English based on the Java implementation of the Stanford POS-tagger (Toutanova et al. 2003).

Parsing

ixa-pipe-parse: English/Spanish Constituent Parsing with Maximum Entropy models (Ratnaparkhi 1999) as implemented by Apache OpenNLP using the Penn and Ancora Treebanks respectively. https://github.com/ixa-ehu/ixa-pipe-parse
Stanford-based Parser: A probabilistic lexicalized dependency parser based on the Stanford statistical parser (Manning and Klein, 2003).
MATE-based Parser: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).
Alpino Parser: A version of the Alpino parser that uses NAF as input and output. https://github.com/cltl/dependency-parser-nl

Semantic Role Labeling

MATE-based SRL: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).

Time Expression identification

TimePro: a tool identifying English temporal expressions. It is trained on TempEval3 data (UzZaman et al., 2013).

Named Entity Recognition

ixa-pipe-nerc: English/Spanish Named Entity Recognition with Perceptron models
(Collins 2002) as implemented by Apache OpenNLP on CoNLL datasets for NER.
https://github.com/ixa-ehu/ixa-pipe-nerc

Named Entity Disambiguation

ixa-pipe-ned

https://github.com/ixa-ehu/ixa-pipe-ned

https://github.com/dbpedia-spotlight/dbpedia-spotlight

Word Sense Disambiguation

UKB based Word Sense Disambiguation: a tool that applies graph-based word sense disambiguation. It is a collection of programs that uses the Personal PageRank on the Lexical Knowledge Base (LKB) to rank vertices on the LKB.

Coreference Resolution

CorefGraph:: a python reimplementation of the coreference resolution tool proposed
by the Stanford NLP group (Lee et al., 2013) for English and Spanish. https://bitbucket.org/Josu/corefgraph

Event classification

KYOTO event classifier: a tool that identifies whether events are a communication, cognition, or other. The classification is based on KYOTO-DOLCE. It makes use of:

OntoTagger: https://github.com/cltl/OntoTagger.git
NAFKybot: https://github.com/cltl/KafKybot.git

Factuality

NewsReader Factuality classifier: a tool that determines the factuality of expressions: a Mallet (McCallum 2002) classifier trained on FactBank v1.0 (Saurí and Pustejovsky, 2009).

Opinion Mining

VUA Opinion Miner: a tool that detects opinions in English and Dutch text and for each opinion extracts:
- The opinion expression
- The opinion holder
- The opinion target
The module uses the conditional random fields implementation provided by CRFsuite (http://www.chokkan.org/software/crfsuite/) and is trained on small manually annotated corpora.

NLP Annotation Format

Background information on NAF

NAF parsers

A Python library for NAF

Pynaf: Yet another python library for NAF

A Java Parser for NAF

Raw text to NAF

NLP Modules that work with NAF

Tokenization

Part-of-Speech tagging

Parsing

Semantic Role Labeling

Time Expression identification

Named Entity Recognition

Named Entity Disambiguation

Word Sense Disambiguation

Coreference Resolution

Event classification

Factuality

Opinion Mining

Leave a Reply Cancel reply