LOT2019: A Deeper Understanding of Distributional Semantics

New: Final Assignment!

The final assignment for this course can be found here.

Deadline: February 22nd 2019!

Course description

Distributional semantic representations are based on the idea that the meaning of words is determined by or at least reflected in their usage. Harris (1954) and Firth (1957) used this idea to formulate what became the distributional semantic hypothesis, which states that words with similar meanings will occur in similar contexts. If this is the case, it should be possible to learn the meaning of words from looking at the contexts they occur in.

Computational linguists have explored this idea for several decades exploring various methods for creating distributional semantic models from large corpora. Distributional semantic models represent the meaning of words as vectors, often called word-embeddings, based on their occurrence in large corpora. Such a vector encodes a word’s context words as a numeric representation. Thanks to increasingly larger corpora and more and more computation power being available, the quality of these models has improved significantly over the last years. Using these representations in computational linguistics has led to an improvement of a wide variety of tasks. They are also used to study word meaning itself, e.g. to identify whether the meaning of a word changed over time.

This course first provides an introduction to distributional semantic models and their role in computational linguistics. We then examine these representations from a linguistic point of view focusing on the following questions: What information do these models capture? How can we verify them or determine their quality? How are linguistic properties represented? What does this tell us about language use?

Preparatory Assignment

What do you think distributional semantic models capture?
LOT_school_precourse_assignment Due: 11 January 2019, 10:00 (am).

Day 1: Introduction what are distributional model and how are they made?

In this first lecture, we will cover the (technical) background of distributional semantic models. What are they? What do they represent (or are they supposed to represent)? How are they created from corpora?

The slides can be found here.

Assignment 2 (day 1, preparation for day 2)

A summary of the assigned reading for day 2 and link to the accompanying dataset can be found here.

Read the summary and answer the questions in this form.
Answers may be short.

Reading (as listed on LOT-page): Hill, Felix, Roi Reichart, and Anna Korhonen. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41, no. 4 (2015): 665-695.

Day 2: Evaluation and Practical Applications

In this lecture, we will discuss how distributional semantic models are used and how they are evaluated. We treat standard intrinsic evaluation methods and practical applications (i.e. how they are used in various NLP applications).

The slides for the second lecture can be found here.

Assignment 3 (day 2, preparation day 3)

The assignment for preparing the third lecture can be found here.

You’ll need to read a part of today’s reading as listed on the original LOT website:

Baroni, Marco, Brian Murphy, Eduard Barbu, and Massimo Poesio. 2010. Strudel: A corpus‐based semantic model based on properties and types. Cognitive science 34, no. 2 (2010): 222-254.

Day 3: Diving Deeper

This lecture addresses which linguistic phenomena distributional semantic models capture and to what extent. We look at what information we expect to be captured. We then discuss methods for testing whether this is indeed the case and insights from the latest research using such methods.

The slides of this lecture can be found here.

Assignment 4 (day 3, preparation day 4)

The assignment for preparing the fourth lecture can be found here.

It is based on the preparatory reading for the fourth class. You can either answer the question based on the abstract of the paper, Figure 1 and your own intuition or read the paper and explain the point of view of the authors.

Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. Cultural shift or linguistic drift? comparing two computational measures of semantic change. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, p. 2116. NIH Public Access, 2016.

Day 4: Exploring models and corpora

This lecture covers two topics: first we continue `diving deeper’ and look into options of investigating distributional semantic models rather than evaluating them. The second part looks into research that uses distributional semantics to study difference in language.

Distributional semantics have been used in the field of digital humanities to study sense shift (linguistic change) and concept drift (as part of historical studies). Researchers have aimed to determine whether the meaning of a word changed by comparing distributional models created from old text to more modern text. We introduce the methods used in such studies and explore their limitations. In particular, do they capture actual shifts or are these artifacts? Are the corpora balanced and large enough?

Assignment 5 (day 4, preparation day 5)

The assignment for preparing the last lecture of this course can be found here.

The details of Hellrich and Hahn’s observations can be found in the reading for that day:

Hellrich, Johannes, and Udo Hahn. 2016. Bad company—neighborhoods in neural embedding spaces considered harmful. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2785-2796.

Day 5: Rounding up & Discussion

This session will start by rounding up material from the previous lecture. We then discuss what the phenomena and studies we have seen this week tell us about distributional semantic models. We focus on what they (can) mean for linguistic theory and, in particular, what the research and properties we have seen contribute to the discussion on the weak vs strong hypothesis: did your opinion change or not? Did this (not) happen based on what you learned about distributional semantics? Or do you think possibilities and/or limitations of distributional semantic models are not informative or (even) cannot be informative to answer this question?

The slides on distributional semantics for diachronic research can be found here.

Reading Materials:

(Listed alphabetically. Please contact me if you have any problems accessing the papers listed below.)

Baroni, Marco, Brian Murphy, Eduard Barbu, and Massimo Poesio. 2010. Strudel: A corpus‐based semantic model based on properties and types. Cognitive science 34, no. 2 (2010): 222-254.
Derby, S., Miller, P., Murphy, B. and Devereux, B., 2018. Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge. In Proceedings of the 22nd Conference on Computational Natural Language Learning (pp. 260-270).
Goldberg, Yoav, and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method.
Hamilton, W.L., Leskovec, J. and Jurafsky, D., 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1489-1501).
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. Cultural shift or linguistic drift? comparing two computational measures of semantic change. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, p. 2116. NIH Public Access, 2016.
Hellrich, Johannes, and Udo Hahn. 2016. Bad company—neighborhoods in neural embedding spaces considered harmful. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2785-2796.
Hill, Felix, Roi Reichart, and Anna Korhonen. 2015, Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics. 41, no. 4: 665-695.
Jurafsky, Dan and James Martin. Speech and Language Processing (3rd edition). Ch. 6 https://web.stanford.edu/~jurafsky/slp3/
Lenci, Alessandro. 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics.
Levy, Omer, Yoav Goldberg, and Ido Dagan. 2015, Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3 (2015): 211-225.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space.
Martinez-Ortiz, C., Kenter, T., Wevers, M., Huijnen, P., Verheul, J. and van Eijnatten, J., 2016. ShiCo: A Visualization Tool for Shifting Concepts Through Time. In Proceedings of the 3rd DH Benelux Conference (DH Benelux 2016)
Sommerauer, P. and Fokkens, A., 2018. Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 276-286).

Antske Fokkens's Homepage

Personal Homepage