The MachineSonarXML data contains the automatic annotation of all SONAR with our three WSD systems:
There is a file for each lemma. The lemma and its part-of-speech is contained in the name of the file (lemma.pos.xml), for instance the file timbl/meester.n.xml contains the automatic tagging by timbl of all the unannotated examples of the nouns meester in SONAR.
For each example there is a ‘token’ element under the general element ‘tokens’. For this token the token identifier in SONAR (token_id), the lexical unit identifier (sense) assigned by the annotator (timbl) with a certain value of confidence (confidence). The lemma and part-of-speech are also contained for verbosity.