This .zip file contains statistical data on human annotations using the loganalyser (see other package). The files are csv files and are based on the log-file output of the DutchSemCor project.
4 files show statistics of the 4 different corpora:
– 1st cycle – the human annotations made independent of the WSD system
– 2nd cycle – the human annotations made through active learning using the TiMBL system
– All-words – the human annotations made on the all-words corpus for evaluating the WSD-systems
– Manual annotations – containing statistics on all human annotations
16 files show statistics of all human annotations based on PoS. 4 files contain an overview of the annotated lemma’s (.overview). 4 files provide information on multiple tags (the number of tokens that have been asigned two or more senses of the same lemma – .multiple tags). 4 files contain information on the time spent on annotating by the different annotators(.time). Finally, 4 files provide annotation statistics per lemma.
|nmr annos||nmr overlapping annos||nmr lemmas||IA weak|
1. Manual anno
|a. 1st cycle||360.260||274.344||2.874||94%|
|b. 2nd cycle||144.274||132.666||1.133||44%|