Software

NAF parsers

A Python library for NAF

Pynaf: Yet another python library for NAF

A Java Parser for NAF

Raw text to NAF

NLP Modules that work with NAF

Tokenization

  • ixa-pipe-tok:A multilingual rule-based tokenizer for English and Spanish compliant with Penn Treebank and Ancora Corpus tokenization. https://github.com/ixa-ehu/ixa-pipe-tok
  • Stanford-based tokenizer: Sentence segmentation and tokenization for English as provided by Stanford CoreNLP.

Part-of-Speech tagging

  • ixa-pipe-pos:English/Spanish POS tagging with Perceptron models (Collins 2002) as implemented by Apache OpenNLP using the WSJ and Ancora corpus respectively.
    Lemmatization is dictionary-based. https://github.com/ixa-ehu/ixa-pipe-pos
  • Stanford-based POS-tagger: POS-tagging for English based on the Java implementation of the Stanford POS-tagger (Toutanova et al. 2003).

Parsing

  • ixa-pipe-parse: English/Spanish Constituent Parsing with Maximum Entropy models (Ratnaparkhi 1999) as implemented by Apache OpenNLP using the Penn and Ancora Treebanks respectively. https://github.com/ixa-ehu/ixa-pipe-parse
  • Stanford-based Parser: A probabilistic lexicalized dependency parser based on the Stanford statistical parser (Manning and Klein, 2003).
  • MATE-based Parser: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).
  • Alpino Parser: A version of the Alpino parser that uses NAF as input and output. https://github.com/cltl/dependency-parser-nl

Semantic Role Labeling

  • MATE-based SRL: a tool providing lemmatization, POS-tagging, dependencies and semantic roles for English and Spanish based on the MATE-tools (Björkelund et al., 2010).

Time Expression identification

  • TimePro: a tool identifying English temporal expressions. It is trained on TempEval3 data (UzZaman et al., 2013).

Named Entity Recognition

Named Entity Disambiguation

Word Sense Disambiguation

  • UKB based Word Sense Disambiguation: a tool that applies graph-based word sense disambiguation. It is a collection of programs that uses the Personal PageRank on the Lexical Knowledge Base (LKB) to rank vertices on the LKB.

Coreference Resolution

  • CorefGraph:: a python reimplementation of the coreference resolution tool proposed
    by the Stanford NLP group (Lee et al., 2013) for English and Spanish. https://bitbucket.org/Josu/corefgraph

Event classification

  • KYOTO event classifier: a tool that identifies whether events are a communication, cognition, or other. The classification is based on KYOTO-DOLCE. It makes use of:
    • OntoTagger: https://github.com/cltl/OntoTagger.git
    • NAFKybot: https://github.com/cltl/KafKybot.git

Factuality

  • NewsReader Factuality classifier: a tool that determines the factuality of expressions: a Mallet (McCallum 2002) classifier trained on FactBank v1.0 (Saurí and Pustejovsky, 2009).

Opinion Mining

  • VUA Opinion Miner: a tool that detects opinions in English and Dutch text and for each opinion extracts:
    • The opinion expression
    • The opinion holder
    • The opinion target

    The module uses the conditional random fields implementation provided by CRFsuite (http://www.chokkan.org/software/crfsuite/) and is trained on small manually annotated corpora.

Leave a Reply

Your email address will not be published. Required fields are marked *