The 42nd Language Lunch

Date: 2014-03-20

Location: G.07 Informatics Forum

Relative f0-excursion as a stylistic variable in (dis)agreements

Mirjam,Eiswirth; PPLS; s1322502@sms.ed.ac.uk

Pitch prominence is highly variable in the four types of (dis)agreements identified by Pomerantz (1984), agreement, same assessment, downgrading and disagreement. Even though sociolinguists and discourse analysts have studied (pitch) prominence of disagreement and negation in some detail , they have not looked at its variation in the three other disagreement types. Looking at a 5:30h corpus of casual conversation between several MSc students, I have extracted all (dis)agreements according to Pomerantz’ criteria and analysed them in terms of pitch prominence (vowel length and f0-excursion) to see what conditions and what complicates the variation. This does not appear to be type of (dis)agreement but rather discourse function, and posits f0-excursion as a potential stylistic variable indexing expressivity (cf. Podesva on phonation type as a stylistic variable, 2007).rn

One-colour “rainbows”: synaesthetic colouring of compound words

Jennifer,Mankin; PPLS; mankin.jennifer@gmail.com

This study investigated what can be learned about synaesthesia from natural language processing and vice versa. In our study, we asked how synaesthetes experience colours for compound words (e.g., keyhole) and in doing this, we also tested how compound words might be processed in normal language use. Using an online colour selection task, 19 grapheme-colour synaesthetes could provide zero, one, or two colours for compound words. We varied the lexical frequency and semantic transparency of these compounds. High-frequency compounds were significantly more likely than low-frequency compounds to have only one colour, rather than two colours. This suggests that there are two different psycholinguistic strategies for processing compound words: high-frequency words are stored as wholes but low-frequency words are broken down into constituents (Kubitza, unpublished; Schreuder & Baayen, 1995). However, there was no effect of semantic transparency. Reports from participants also revealed greater complexity in synaesthetic word colouring than previous research on grapheme-colour synaesthesia has been able to capture. Our results show that synaesthetic colours vary meaningfully with linguistic measures and can be used to understand the nature of both synaesthesia itself and natural language processing in the general population.rn

Sparse kernel learning for image annotation

Sean,Moran; None; None

We introduce a sparse kernel learning framework for the Continuous Relevance Model (CRM). State-of-the-art image annotation models linearly combine evidence from several different feature types to improve image annotation accuracy. While previous authors have focused on learning the linear combination weights for these features, there has been no work examining the optimal combination of kernels. We address this gap by formulating a sparse kernel learning framework for the CRM, dubbed the SKL-CRM, that greedily selects an optimal combination of kernels. Our kernel learning framework rapidly converges to an annotation accuracy that substantially outperforms a host of state-of-the-art annotation models. We make two surprising conclusions: firstly, if the kernels are chosen correctly, only a very small number of features are required so to achieve superior performance over models that utilise a full suite of feature types; and secondly, the standard default selection of kernels commonly used in the literature is sub-optimal, and it is much better to adapt the kernel choice based on the feature type and image dataset.

Unsupervised induction of cross-lingual semantic relations

Mike,Lewis; ILCC; mike.lewis@ed.ac.uk

Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.