The 67th Language Lunch

Date: 2019-04-11

Location: G.07 Informatics Forum

Modelling lexical interactions in diachronic corpora

Andres Karjus; a.karjus@sms.ed.ac.uk

Richard A. Blythe

Simon Kirby

Kenny,Smith

Large diachronic text corpora enable a data-based approach to the study of language change dynamics. The development of unsupervised methods for the inference of semantic similarity, semantic change and polysemy from such large datasets mean that, in addition to measuring orthographic similarity or counting frequencies, it is also possible to measure meaning and therefore the evolution of semantics and discourse topics over time. We present completed work on a baseline model of frequency change in corpora, the topical-cultural advection model, and show how the quantification of topical fluctuations (as a proxy to changing communicative need) predicts lexical competition in semantic subspaces. Lexical competition between a trending word and its closest semantic neighbours is operationalized using diachronic word embeddings and a measure of relative change in probability mass. We demonstrate, using large diachronic corpora in multiple languages, that a model incorporating the advection measure (and a number of lexicostatistical control variables) is capable of describing a considerable amount of variance in the competition variable: low communicative need leads to competition and possible replacement of an old word, while high communicative need facilitates co-existence of similar words.

Data-to-Text Generation with Content Selection and Planning

Ratish Puduppully; r.puduppully@sms.ed.ac.uk

Li Dong

Mirella Lapata; mlap@inf.ed.ac.uk

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.rn

Practical Semantic Parsing for Spoken Language Understanding

Marco Damonte; s1333293@sms.ed.ac.uk

Executable semantic parsing is the task of converting natural language utterances into logical forms that can be directly used as queries to get a response. We build a transfer learning framework for executable semantic parsing. We show that the framework is effective for Question Answering (Q&A) as well as for Spoken Language Understanding (SLU). We further investigate the case where a parser on a new domain can be learned by exploiting data on other domains, either via multi-task learning between the target domain and an auxiliary domain or via pre-training on the auxiliary domain and fine-tuning on the target domain. With either flavor of transfer learning, we are able to improve performance on most domains; we experiment with public data sets such as Overnight and NLmaps as well as with commercial SLU data. The experiments carried out on data sets that are different in nature show how executable semantic parsing can unify different areas of NLP such as Q&A and SLU.

Does language script matter?

Lihua Xia; helen.xia@ed.ac.uk

Logographic Chinese and alphabetic English distinguished each over various aspects. The characteristics of the languages resulted in differential cognitive processing of information and brain networks involved in language processing. Consequently, Chinese native speakers primarily relied on visual coding and were more sensitive to visual information, and English speakers primarily relied on phonological coding and were sensitive to auditory information. Neural consequences are that brain structures and areas involved are different in native Chinese and English speakers when processing visual and auditory information. Based on these differences, we were curious about if these differences could extend from linguistic characteristics to cognitive functions. The current study compared Chinese monolinguals and English monolinguals on five well-established cognitive tasks. The results showed that Chinese speakers performed better on visual attention tasks (orienting in the Attention Network Task, and facilitation in the Number Stroop task), and mental rotation (Corsi Tapping task), while English speakers showed superior performance on auditory attention task (attentional switching in the Test of Everyday Attention). The results indicated that language scripts could affect individuals’ cognitive performance.

Discourse Representation Structure Parsing

Jiangming Liu; s1674022@sms.ed.ac.uk

We introduce an open-domain neural semantic parser which generates formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle, 1993). We propose a method which transforms Discourse Representation Structures (DRSs) to trees and develop a structure-aware model which decomposes the decoding process into three stages: basic DRS structure prediction, condition prediction (i.e., predicates and relations), and referent prediction (i.e., variables). Experimental results on the Groningen Meaning Bank (GMB) show that our model outperforms competitive baselines by a wide margin.

The Production Mechanism across Interpreting Types – Combining Computational Quantification and Behavioural Study Results

Qianxi LV; q.lv-1@sms.ed.ac.uk

Among the various modes of interpreting, simultaneous interpreting (SI) has been addressed by different authors as a ‘complex’ and ‘extreme condition’ of cognitive tasks while consecutive interpreters (CI) do not have to share processing capacity between tasks. Given that SI exerts great cognitive demand, it makes sense to posit that the output of SI may be more compromised than that of CI in the linguistic features. In keeping with our interest in investigating the quantitative linguistic factors discriminating between SI and CI, we examine the lexical and syntactic complexity and sequential organization mechanism as well as how much information was retained with an inter-model corpus of transcribed simultaneous and consecutive interpretation. While previous studies on interpreting studies generally regard SI as an extreme situation of multitasking with the highest cognitive load, our findings evidently show that CI may impose heavier or taxing cognitive resource differently and hence yields more lexically and syntactically simplified output. In addition, the sequential features manifest that different task features mediate the sequential organization of interpreting output under different demand to achieve cognitive load minimization. We consequently propose a revised effort model based on the result for a better illustration of cognitive demand during both interpreting types. A series of experiments simulating both tasks with priming paradigm are conducted to testify these possible production preferences in both SI and CI. Possible strategies to reduce the production effort during both tasks are revealed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.