Update 11-04-2015: A github repository has been created to convert Folia XML to NAF: SoNar2Naf. The all words part of dutchsemcor is hence now also available in NAF (+ dependencies, entities) and contains besides Cornetto sense also Open Source Dutch WordNet 1.0 senses.
All-words.txt contains sequential annotations made by linguists to evaluate WSD-systems used in the DutchSemCor project.
Number of tokens = 23.907
Number of lemmas = 1.527
The All-words corpus covers a small number of texts, limited to a selection of genres and domains.
1.3.1.ALLWORDS_DSC.zip contains three XML files for each Part-of-speech (noun, verb, adjective) including the annotated examples for the allwords corpus developed in the DutchSemCor project. For each annotated example (element) the following information is stored in the DSC-XML file:
+ lemma: the lemma of the example
+ pos: the part-of-speech
+ sense_id: the lexical unit identifier of Cornetto for the example
+ token_id: the token identifier of the example in SONAR