Introduction to the Cornetto Database
The Cornetto project (STE05039) is funded by the Nederlandse Taalunie within the STEVIN framework. The goal of Cornetto is to build a lexical semantic database for Dutch, covering 40K entries, including the most generic and central part of the language. The database will go beyond the structure and content of Wordnet and FrameNet. It will contain both vertical and horizontal semantic relations and combinatorial lexical constraints such as multiword expressions, idioms and collocations on the one hand, and lexical functions and frames on the other. The concepts will be aligned with the English Wordnet so that ontologies and domain labels can be imported. The semantic layer will be validated with a formal ontology, to make it usable in Semantic Web environments.
Cornetto will develop a Dutch semantic database that combines the structure and content of both the Princeton Wordnet and FrameNet for English. The content and semantic structure will be achieved by combining and aligning two existing semantic resources for Dutch: the Dutch wordnet (Vossen 1998) and the Referentie Bestand Nederlands (Martin et al 1999). The Dutch wordnet (DWN) is originally based on the conceptual structure of the Van Dale VLIS database, and has been revised and refined in the EuroWordNet project (1996-1999). DWN has a similar structure as the English Wordnet although the top-level hierarchy was developed from an ontological framework and more horizontal relations are defined. It has 55K entries, 70K word meanings and 110K semantic relations. Over 7K horizontal relations have been defined, among which roles, co-roles (e.g. "guitar player" -> guitar (instrument) & player (agent)), subevent and causal relations. The Referentie Bestand Nederlands (RBN) was developed with financial support of the governments of The Netherlands and Flanders (1996-1999, revised in 2004). It contains 45K entries with corpus-based descriptions, covering morphology, syntax, combinatorics, semantics and pragmatics. In the domain of semantics, the RBN lists e.g. countability, semantic types, meaning shifts and selectional restrictions. Furthermore, there are slot for synonyms, and a systematic distinction between an analytic genus and proximum-differentiae definitions.
RBN is concept-based but also encodes the common word combinations that go with concepts, ranging from free combinations, to collocations, idioms, pragmatic formulae, proverbs and all sorts of typical and common usage and clichés, lexical functions and compositional and selectional preferences. Combinations can be encoded as lexical functions a la Me’lcuk or FrameNet-like structures.
Last update: 22 October, 2008, p.vossen (at) let.vu.nl