The 64th Language Lunch

Date: 2018-10-25

Location: G.07 Informatics Forum

A Check of the Linguistic Validity of Parts of Speech Features Used in Native Language Identification Tasks

Shilin Gao; s.gao-14@sms.ed.ac.uk

My research topic is Native Language Identification (NLI) where researchers attempt to tell the author’s first language in an anonymous English-as-a-second-language (ESL) text from different perspectives. In previous researches, the computational methods have achieved considerable accuracy in NLI tasks. However, many of the models are generated from the descriptive statistical results and does not reveal the linguistic mechanism behind it. My research aims to fill in this gap by checking the linguistic validity of the data-based models. I choose one of the most commonly used feature, parts-of-speech (POS) features, and established logistic regression models based on 65 POS features using the International Corpus Network of Asian Learners of English. The models have achieved over 92% accuracy in identifying if the author is from certain language background. Checking the POS features in both the models and the World Atlas of Language Structures (WALS), I divided the model results into three kinds: Firstly, the results that are in line the linguistic features of the author’s mother tongue, including Chinese Mandarin speakers’ use of nominalisation and Japanese speakers’ use of pronouns. Secondly, the results that are in contrast with the author’s native language structure, including Chinese Mandarin and Korean speakers’ use of possessive construction. Thirdly, the results which I have not found relevant linguistic feature in the author’s mother tongue, including the use of the word because and That Relative Clauses on Subject Position. These findings provide insights of the cross-linguistic transfer of language structures in NLI tasks.

Investigating Bradford speakers’ ability to imitate a SSBE accent: Implications for forensic phonetics

Lucy A. Jackson; l.a.jackson-1@sms.ed.ac.uk

Erica Gold

This case study considers Bradford speakers’ ability to imitate a Standard Southern British English (SSBE) accent across read and spontaneous speech. Voice disguise is estimated to occur in around one in forty cases in the UK (Clark and Foulkes 2007). However, there is a growing trend in the use of voice disguise by perpetrators to conceal and falsify their identities in such cases as threatening calls, kidnapping, extortion and even emergency services calls (Zhang & Tan, 2008). Forensic speaker comparison and voice profiling cases involving voice disguise are made more difficult when assessing the identity of a speaker. Voice disguise is often achieved through electronic means (Rodman, 2000; 2003) or accent imitation (Neuhauser, 2008). For this reason, this paper examines speakers’ ability to imitate an accent that requires them to split two phonemic mergers that are merged in their native Bradford accent (STRUT/FOOT and TRAP/BATH). The results suggest that Bradford speakers could not successfully produce, STRUT, FOOT, BATH and TRAP realisations similar to those of SSBE speakers, as they were not able to confidently split the STRUT/FOOT or TRAP/BATH mergers. The results also found that the most prominent feature susceptible to different speech styles when imitating an accent was vowel height, as participants could not maintain their altered vowel heights of the four lexical sets consistently. As a result, sociophonetic splits and mergers have the potential to be a reliable tool in analysing imitated accents, and aid in forensic speaker comparison or profiling cases.

The influence of world knowledge on projectivity: listeners revise their gender stereotypes when processing presuppositions triggered by stop

Alexandra Lorson; A.Lorson@sms.ed.ac.uk

The change of state verb stop is known to trigger a pre-state presupposition of the form ‘X used to V’. However, confronted with a question such as Did David stop dancing ballet?, listeners do not necessarily understand that David used to dance ballet. Several factors seem to converge in influencing listeners’ understanding of presuppositions in these so-called embedding environments, of which questions are an example. One of these factors was found to be at-issueness, i.e. the extent to which content is judged to be under debate.rnrnAnother factor that was claimed to influence the processing of presuppositions was world knowledge in form of prior probabilities of events. Following this argumentation, listeners who find that it is unlikely that David dances ballet when processing Did David stop dancing ballet?, are claimed to less likely understand that David used to dance ballet and stopped doing so. This study investigated the influence of listeners’ prior beliefs on their processing of presuppositions, and more specifically, in what way the understanding of presuppositions triggered by stop is influenced by gender stereotypes. This reasoning is based on recent studies that found that people indeed have strong beliefs when it comes to gender stereotypes (such that they consider a ballet dancer to be a woman rather than a man) which in turn had an effect on their language processing.rnWhereas the influence of at-issueness could be replicated, no evidence was found that listener’s gender stereotypes affect their understanding of presuppositions. Instead, the results are in line with the hypothesis that listeners revise their prior beliefs when they are confronted with utterances that contradict their beliefs.rn

Aphasia in West Greenlandic affects syntax but not morphology

Johanne Nedergaard; j.s.k.nedergaard@sms.ed.ac.uk

This project investigated how the grammar of the polysynthetic language West Greenlandic is affected in speakers with non-fluent aphasia due to stroke. We focused on West Greenlandic because of its radically different structure to more commonly researched languages such as English, including uniquely complex derivational and inflectional morphology. We interviewed five participants with aphasia and compared their speech on several parameters with that of matched non-brain-damaged control participants. These parameters included standard production measures, measures of morphological complexity, and measures of syntactic complexity. Our findings indicated that non-fluent aphasia in West Greenlandic is not expressed through an impairment on the morphology; instead, participants with aphasia produced speech at a slower rate with shorter utterances, and some of them displayed significant impairments on measures of syntax. While somewhat surprising from the point of view of research on aphasia in English, our findings align well with findings from other languages with complex morphology such as Finnish, Turkish, and Japanese. Our findings highlight the need for a diverse range of crosslinguistic studies to inform linguistic theories.

Learning Dynamics Of Language Models

Naomi Saphra; s1477768@sms.ed.ac.uk

Recent work has demonstrated that neural language models encode linguistic structure implicitly in a number of ways. However, existing research has not shed light on the process by which this structure is acquired during training. We present several experiments suggesting that a single recurrent layer of a language model learns linguistic structure in phases. We find that a language model naturally learns low-level local linguistic structure first, graduating from representations focused on syntactic structure to representations which contain semantic and topic information. Further experiments shed light on this process by demonstrating that the gradients of function word are sparser than the gradients of content words, implying that specialized subnetworks are being learned in the training process.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.