Idiom Tagger
The idiomTagger is a tool developed within the DSC to automatically detect idiomatic uses of words. The main class is the class IdiomClass, and the main function to detect if a usage of a certain word is idiomatic is IdiomClass.isIdiomStr(…). This function takes 4 parameters: the lemma, the pos, the context and the string used to enclose the lemma in the given context.
An example to use the tagger is:
lemma=’drempel’
pos=’n’
context=’ blijft de prijs voor een aantal opdrachtgevers een ###drempel###
vormen . Dit is vooral het geval’
enclosedBy=’###’
objIdioms = IdiomClass(pos)
data = objIdioms.isIdiomStr(lemma,pos,context,enclosedBy)
print data
The returned date is a tuple with the following fields:
+ 1 True/False: if is an idiomatic expression or not
+ 2 An string containing the cannonical form of the idiom in this case
+ 3 The context of the idiomatic expression
+ 4 The lexical unit identifier of the idiom