Cornetto2.0-base-concepts is a list of synset identifiers from Cornetto2.0 that present the most important synsets in the wordnet graph. Importance is based on the position in the hierarchy and the number of relations. The base concepts are chosen in such a way that every synset is mapped to a base concept through the hypernym relations (either directly or indirectly).
Base Concepts play a crucial role in the semantic processing of text. Many semantic relations are similar for all concepts related to the same base concept.Hence, one of the approaches in DutchSemCor is to make a WSD classifier for BaseConcepts trained by all training data of synsets belonging to the same concepts.
The base concepts are extracted using the get-blc perl script created by Egoitz Laparra. The perl script is included in this distribution. The script requires synsets and relations in a particular format (see the readme.txt of the script). The DutchSemCor tool set provides a function to extract the data in the input formatfor the perl script. The shell script get-BC-import-data-from-cdb-syn.sh gives the calls to the java library and the perl script to generate the data file.
A dump of the Cornetto database is required as a starting point.
The resulting output files are given in the folder data, where the number in the file name indicate the minimal number of relations that a base concept should have. Furthermore, we created separate files using all the relations (all) or just the hypernym relations (hypo). For each setting, there is a file with just the base concepts and the number of related concepts (list) and a file that maps each synset to its corresponding base concept (rel).