COLING2016 Tutorial T-1: Compositional Distributional Models of Meaning

Tutorial Slides

http://coling2016.anlp.jp/doc/tutorial/slides/T1/KartsaklisSadrzadeh.pdf

Organizers

Mehrnoosh Sadrzadeh, Dimitri Kartsaklis

Affiliation

Queen Mary University of London
School of Electronic Engineering and Computer Science
Mile End Road, London E1 4NS, United Kingdom

Website link of the tutorial: https://sites.google.com/site/coling2016cdmtutorial/

Description

Distributional models of meaning are based on the pragmatic hypothesis that meanings of words are deducible from the contexts in which they are often used. This hypothesis is formalized using vector spaces, wherein a word is represented as a vector of co-occurrence statistics with a set of context dimensions. With the increasing availability of large corpora of text, these models constitute a well-established NLP technique for evaluating semantic similarities. Their methods however do not scale up to larger text constituents (i.e. phrases and sentences), since the uniqueness of multi-word expressions would inevitably lead to data sparsity problems, hence to unreliable vectorial representations. The problem is usually addressed by the provision of a compositional function, the purpose of which is to prepare a vector for a phrase or sentence by combining the vectors of the words therein. This line of research has led to the field of compositional distributional models of meaning (CDMs), where reliable semantic representations are provided for phrases, sentences, and discourse units such as dialogue utterances and even paragraphs or documents. As a result, these models have found applications in various NLP tasks, for example paraphrase detection; sentiment analysis; dialogue act tagging; machine translation; textual entailment; and so on, in many cases presenting state-of-the-art performance.

Being the natural evolution of the traditional and well-studied distributional models at the word level, CDMs are steadily evolving to a popular and active area of NLP. The topic has inspired a number of workshops and tutorials in top CL conferences such as ACL and EMNLP, special issues at high-profile journals, and it attracts a substantial amount of submissions in annual NLP conferences. The approaches employed by CDMs are as much as diverse as statistical machine leaning, linear algebra, simple category theory, or complex deep learning architectures based on neural networks and borrowing ideas from image processing. Furthermore, they create opportunities for interesting novel research, related for example to efficient methods for creating tensors for relational words such as verbs and adjectives, the treatment of logical and functional words in a distributional setting, or the role of polysemy and the way it affects composition. The purpose of this tutorial is to provide a concise introduction to this emerging field, presenting the different classes of CDMs and the various issues related to them in sufficient detail. The goal is to allow the student to understand the general philosophy of each approach, as well as its advantages and limitations with regard to the other alternatives.

Some background on CDMs

The purpose of a compositional distributional model is to provide a function that produces a vectorial representation of the meaning of a phrase or a sentence from the distributional vectors of the words therein. One can broadly classify such compositional distributional models to three categories:

Vector mixture models: These are based on simple element-wise operations between vectors, such as addition and multiplication (Mitchell and Lapata, 2010). Vector mixture models constitute the simplest compositional method in distributional semantics. Despite their simplicity, though, have been proved a very hard-to-beat baseline for many of the more sophisticated models.
Tensor-based models: In these models, relational words such as verbs and adjectives are tensors and matrices contracting and multiplying with noun (and noun-phrase) vectors (Coecke et al., 2010; Baroni and Zamparelli, 2010). Tensor-based models provide a solution to the problems of vector mixtures: they are not bag-of-words approaches and they respect the type-logical identities of special words, following an approach very much aligned with the formal semantics perspective. In fact, tensor-based composition is considered as the most linguistically motivated approach in compositional distributional semantics.
Neural-netword based models: Models in which the compositional operator is part of a neural network and is usually optimized against a specific objective (Socher et al., 2012, Kalchbrenner et al., 2014, Cheng and Kartsaklis, 2015). Architectures that are usually employed is that of recursive or recurrent neural networks and convolutional neural networks. The non-linearity in combination with the layered approach in which neural networks are based make these models quite powerful, allowing them to simulate the behaviour of a range of functions much wider than the linear maps of tensor-based approaches.

Outline

The tutorial aims at providing an introduction to these three classes of models, covering the most important aspects. Specifically, it will have the following structure (subject to time limitations):

Introduction. The distributional hypothesis - Vector space models - The necessity for compositionality - Applications - An overview of CDMs
Vector mixture models. Additive and multiplicative models - Interpretation - Practical applications
Tensor-based models. Unifying grammar and semantics - Relational words as multi-linear maps - Extensions of the model
Deep learning models. Introduction to NNs - Recursive and Recurrent NNs for composition - Connection to image processing - Convolutional NNs
Advanced issues and conclusion. Logical and functional words - Lexical ambiguity and composition - Moving to discourse level - Concluding remarks

Prerequisities

The only prerequisite for attending the tutorial is a knowledge of standard linear algebra, specifically with regard to vectors and their operations, vector spaces, matrices and linear maps. No specific knowledge on advanced topics, such as category theory or neural networks, will be necessary.