Human-annotation statistics

This .zip file contains statistical data on human annotations using the loganalyser (see other package). The files are csv files and are based on the log-file output of the DutchSemCor project.

4 files show statistics of the 4 different corpora:

– 1st cycle – the human annotations made independent of the WSD system

– 2nd cycle – the human annotations made through active learning using the TiMBL system

– All-words – the human annotations made on the all-words corpus for evaluating the WSD-systems

– Manual annotations – containing statistics on all human annotations

16 files show statistics of all human annotations based on PoS. 4 files contain an overview of the annotated lemma’s (.overview). 4 files provide information on multiple tags (the number of tokens that have been asigned two or more senses of the same lemma – .multiple tags). 4 files contain information on the time spent on annotating by the different annotators(.time). Finally, 4 files provide annotation statistics per lemma.


nmr annos nmr overlapping annos nmr lemmas IA weak

1. Manual anno

489.637 203.171 2.941 74%
a. 1st cycle 360.260 274.344 2.874 94%
b. 2nd cycle 144.274 132.666 1.133 44%
c. All-words 40.091 6.085 1.609 89%


