The data contains all the paragraphs in SONAR tagged with their corresponding domains (dutch labels). This tagging has been performed automatically by means of our domainTaggerDSC (also available to download).
One XML file has been created for each document in SONAR, and all the domains assigned to each paragraph for this SONAR document are contained in the XML file.
For each domain assigned to the paragraph, the confidence of the classifier is stored in the attribute ‘value’. All the XML files can be found in the folder DOMAINS_SONAR_XML.
(Download 1.6.MACHINE_DOMAINS.zip in DSC-XML)