Idiom Tagger

Idiom Tagger


The idiomTagger is a tool developed within the DSC to automatically detect idiomatic uses of words. The main class is the class IdiomClass, and the main function to detect if a usage of a certain word is idiomatic is IdiomClass.isIdiomStr(…). This function takes 4 parameters: the lemma, the pos, the context and the string used to enclose the lemma in the given context.

An example to use the tagger is:

lemma=’drempel’

pos=’n’

context=’ blijft de prijs voor een aantal opdrachtgevers een ###drempel###

vormen . Dit is vooral het geval’

enclosedBy=’###’

objIdioms = IdiomClass(pos)

data = objIdioms.isIdiomStr(lemma,pos,context,enclosedBy)

print data



The returned date is a tuple with the following fields:

+ 1 True/False: if is an idiomatic expression or not

+ 2 An string containing the cannonical form of the idiom in this case

+ 3 The context of the idiomatic expression

+ 4 The lexical unit identifier of the idiom

 

(Download 2.7.IDIOM_TAGGER.zip)

Leave a Reply