Evaluation of WSD-systems

Evaluation of WSD-systems


This file contains three different evaluation of WSD-systems with DutchSemCor data.

FOLD-CROSS VALIDATION

Using the annotated data, we followed a fold-cross validation, and the results are in the FoldCross subfolder. The file overallEvaluation.odt shows the overall results of our three systems separately (timbl, svm and ukb) as well as an evaluation of a voting among the three systems. Different results are shown for different combinations of features and type of concepts (senses, base concepts and sense groups).

RANDOM EVALUATION


For this evaluation, we have selected a set of lemmas considering the fold-cross evaluation. In order to have a good representation of all the lemmas, a set of them has been selected for each range of performance. We have considered the following ranges of accuracy:

+ Between 90 and 100 (not with acc=100) + Between 80 and 90 + Between 70 and 80 + Between 60 and 70

5 nouns, 5 verbs and 3 have been selected randomly within each range, a total of 52 lemmas (selectedWords.txt). For all of these, 100 untagged examples in SONAR has been automatically tagged by our system, and then manually tagged to perform the evaluation. The 5200 instances have been tagged by TiMBL, SVM and UKB.

The gold standard of this evaluation can be found in the file annotations.RandEval.DSC.xml

ALL-WORDS


For the all words evaluation, the instances have been also tagged with our 3 systems, and with the combinations of them. The combination has been performed following a weighted voting with these weights:

+ timbl: 1.1 + svm: 1.5 + ukb: 1

The results for all the lemmas and senses can be found under the corresponding subfolder (timbl, svm, ukb and combo), for all the PoS.

(Download The evaluation results)

(Download Test data used for the evaluation)

Leave a Reply