The 48th Language Lunch

Date: 2015-06-11

Location: G.07 Informatics Forum

Are Comprehension-Elicited Lexical Predictions Specified at a Phonological Level within the Speech Production System?

Eleanor,Drake; PPLS; e.k.e.drake@sms.ed.ac.uk

The generation of comprehension-induced predictions affects both the timing and articulatory realization of spoken output (e.g., Drake, Schaeffler, & Corley, 2014). The current study investigates whether these effects are predicated on the phonological relationship between a predicted word and a picture-name. We elicited lexical predictions by acoustically presenting sentence-stems. Pictures were named in 4 conditions: match (picture-name fully matched the lexical prediction), onset-overlap (e.g., can-CAP), rime-overlap (e.g., can-TAN), and a control condition (acontextual picture naming). Articulation was captured via ultrasound tongue imaging. Articulatory patterns during the response latency period differed according to whether the picture-name matched the lexical prediction or not, but not according to the phonological relationship between the picture-name and the lexical prediction (i.e., onset-overlap did not differ from rime-overlap). This suggests that the speech-motor consequences of comprehension-elicited predictions may reflect generalized mismatch monitoring processes rather than the activation of fully-specified predictions within the speech production system.

Priming the passive construction from Scottish Gaelic to English

Catriona,Gibb; None; None

rnIt is often the case that two constructions in Language A appear to correspond to a single construction in Language B. How do bilingual speakers of these languages represent such possibilities? Evidence from structural priming suggests that bilinguals can share syntactic representations across languages (Hartsuiker et al., 2004). To investigate this, we used a syntactic priming paradigm where participants listened to a description of a picture in Gaelic (prime) and then stated if it matched the picture on screen. They would then describe a new picture in English. Prime sentences were manipulated to be either an active, baseline (noun phrase), or one of two types of Gaelic passive constructions. Our results revealed a significant effect of prime type with participants more likely to produce a passive description following a passive prime than a baseline. We also found no significant difference in priming effects between either passive prime type. Therefore, the results suggest that our participants had shared representations across the English passive and both forms of the Gaelic passive. We interpret these results in terms of the theory posited by Bernolet et al. (2013) who claim that higher proficient bilinguals are more likely to incorporate constructions into a single language-independent representation.

Deep neural network context embeddings for model selection in rich-context HMM synthesis

Thomas,Merritt; None; None

rnThis paper introduces a novel form of parametric synthesis thatrnuses context embeddings produced by the bottleneck layer ofrna deep neural network to guide the selection of models in arnrich-context HMM-based synthesiser. Rich-context synthesisrn– in which Gaussian distributions estimated from single lin-rnguistic contexts seen in the training data are used for synthesis,rnrather than more conventional decision tree-tied models – wasrnoriginally proposed to address over-smoothing due to averag-rning across contexts. Our previous investigations have confirmedrnexperimentally that averaging across different contexts is in-rndeed one of the largest factors contributing to the limited qualityrnof statistical parametric speech synthesis. However, a possiblernweakness of the rich context approach as previously formulatedrnis that a conventional tied model is still used to guide selectionrnof Gaussians at synthesis time. Our proposed approach replacesrnthis with context embeddings derived from a neural network.

Movie Script Summarization as Graph-based Scene Extraction

Philip,Gorinski; None; None

We study the task of movie script summarization, whichrnwe argue could enhance script browsing, give readers a rough idea ofrnthe script’s plotline, and speed up reading time. We formalize thernprocess of generating a shorter version of a screenplay as the taskrnof finding an optimal chain of scenes. We develop a graph-basedrnmodel that selects a chain by jointly optimizing its logicalrnprogression, diversity, and importance. Human evaluation based on arnquestion-answering task shows that our model produces summariesrnwhich are more informative compared to competitive baselines.

A Comparison of Neural Network Methods for Unsupervised Representation Learning on the Zero Resource Speech Challenge

Daniel,Renshaw; None; None

The success of supervised deep neural networks (DNNs) in speech recognition cannot be transferred to zero-resource languages where the requisite transcriptions are unavailable. We investigate unsupervised neural network based methods for learning frame-level representations. Good frame representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword units for within- and across- speaker phonetic discrimination. We enhance the correspondence autoencoder (cAE) and show that it can transform Mel Frequency Cepstral Coefficients (MFCCs) into more effective frame representations given a set of matched word pairs from an unsupervised term discovery (UTD) system. The cAE combines the feature extraction power of autoencoders with the weak supervision signal from UTD pairs to better approximate the extrinsic task’s objective during training. We use the Zero Resource Speech Challenge’s minimal triphone pair ABX discrimination task to evaluate our methods. Optimizing a cAE architecture on English and applying it to a zero-resource language, Xitsonga, we obtain a relative error rate reduction of 35% compared to the original MFCCs. We also show that Xitsonga frame representations extracted from the bottleneck layer of a supervised DNN trained on English can be further enhanced by the cAE, yielding a relative error rate reduction of 39%.

Jun 11, 2015

The 48th Language Lunch / Language Lunch @ Edinburgh by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

The 48th Language Lunch

The 48th Language Lunch

Date: 2015-06-11

Location: G.07 Informatics Forum

Are Comprehension-Elicited Lexical Predictions Specified at a Phonological Level within the Speech Production System?

Priming the passive construction from Scottish Gaelic to English

Deep neural network context embeddings for model selection in rich-context HMM synthesis

Movie Script Summarization as Graph-based Scene Extraction

A Comparison of Neural Network Methods for Unsupervised Representation Learning on the Zero Resource Speech Challenge

Leave a Reply Cancel reply

Report this page