Machine Idioms
1.7.MACHINE_IDIOMS_DSC.zip contains all the idioms detected automatically in the SONAR corpus in a DSC-xml file. For each idiom detected the following information is stored:
+ form: the canonical form of the idiom
+ idiom_id: the identifier of the idiom in Cornetto
+ lemma: the lemma of the example
+ pos: the part-of-speech
+ token_id: the token identifier of the example in the SONAR corpus