Human-annotation statistics
This .zip file contains statistical data on human annotations using the loganalyser (see other package). The files are csv files and are based on the log-file output of the DutchSemCor project.
4 files show statistics of the 4 different corpora:
– 1st cycle – the human annotations made independent of the WSD system
– 2nd cycle – the human annotations made through active learning using the TiMBL system
– All-words – the human annotations made on the all-words corpus for evaluating the WSD-systems
– Manual annotations – containing statistics on all human annotations
16 files show statistics of all human annotations based on PoS. 4 files contain an overview of the annotated lemma’s (.overview). 4 files provide information on multiple tags (the number of tokens that have been asigned two or more senses of the same lemma – .multiple tags). 4 files contain information on the time spent on annotating by the different annotators(.time). Finally, 4 files provide annotation statistics per lemma.
Overview
nmr annos | nmr overlapping annos | nmr lemmas | IA weak | |||
1. Manual anno |
489.637 | 203.171 | 2.941 | 74% | ||
a. 1st cycle | 360.260 | 274.344 | 2.874 | 94% | ||
b. 2nd cycle | 144.274 | 132.666 | 1.133 | 44% | ||
c. All-words | 40.091 | 6.085 | 1.609 | 89% | ||