The 73rd Language lunch, virtual poster session!

To Skim or Not to Skim: An Observational Study on Electronic Academic Reading Strategies

Pauliina T.E. Vuorinen – University of Edinburgh, Institute for Language, Cognition and Computation – p.t.e.vuorinen@sms.ed.ac.uk
Benjamin W. Tatler
– University of Aberdeen

Reading strategies are crucial for academic achievement but they are rarely used electronically, which is the most common format of academic reading. To increase strategy usage, more information is needed about how students engage in reading. We developed an objective method for measuring the electronic reading process and observed students reading course materials. Supporting our hypotheses, students who frequently alternated their reading speed and read the document more flexibly had higher grades. Strategy usage was predicted by year of study and lower print habit, indicating that experience with academic documents and electronic environments can encourage strategy usage.

Are more complex languages easier to learn?

Camilla Lalli – University of Edinburgh Centre for Language Evolution – c.lalli@sms.ed.ac.uk
Despite being more complex compared to other types of languages (Lupyan & Dale, 2010, Trudgill, 2011), languages with noun classification and agreement systems are widespread across the world: they are present in languages genetically and geographically non-related, like Cantonese, German and Sesotho (a Bantu language). Such systems are particularly interesting. In fact, they seem to provide a communicative advantage in terms of language processing, as shown by studies on Cantonese and German (Tsang, Chambers & Mozuraitis, 2017; Dye et al., 2016) which suggest that agreement systems facilitate word recognition. On the other hand, these systems appear to be more costly in terms of language acquisition: in fact, they require learning several morphological encodings both for the noun class and the agreement.
However, recent research has shown that this morphological redundancy might facilitate noun class acquisition. For example, Demuth and Weschler (2012) investigate noun class acquisition in Sesotho. Sesotho has a large number of noun classes with their corresponding agreement encodings. However, in certain grammatical conditions, Sesotho allows null prefixes. The results of this study are that children make mistakes in noun classification and agreement only with nouns which can take the null prefix. This suggests that, indeed, agreement, strengthens the robustness of children’s lexical representations.

However, this study does not provide a direct test of the role of morphological agreement in noun class learning. My dissertation sets out to investigate how agreement affects noun class acquisition through an artificial language learning paradigm. There are three experimental conditions: reduplicative agreement, arbitrary agreement and no agreement. The main prediction is that the no agreement condition will yield worse learning results compared to the other two, as it lacks multiple cues to noun class. A second prediction is that participants in the reduplicative agreement condition will perform better than those in the arbitrary agreement one, who have the benefit of multiple cues for the noun class, but the additional challenge of learning two suffixes.

Plural Fricative Lenisisation

Siqing Li – University of Edinburgh – s1640617@sms.ed.ac.uk
There are exceptions to the regular English pluralization rule, one of whichinvolves the changing of word-final fricatives from fortis to lenis during pluralization:

wife ~ wives      house ~ houses       truth ~ truths
[waɪf][waɪvz]        [haʊs][haʊzɪz]        [truːθ][truːðz]

This lenisisation has been documented only happening to word-final fricatives /f/, /s/ and /θ/. I call this phenomenon Plural Fricative Lenisisation (PFL). No previous studies have been done onexactly which words can undergo PFL, nor on whether the words which have been claimed to undergo PFL still do so, or whether there is dialectal variation in terms of these issues. Part of my PhD thesis aims to answer some of these questions. For each fricative, I collected four lists of words, including wordswhich have been claimed to undergo PFL (PFL words), words phonologically and orthographically similar and less similar to the PFL words, and nonce words created based on the PFL words .All the words were tested through an online questionnaire in which participants were asked how they would pronounce the plural form of these wordsand how they think about the recordings of different pronunciations, then choose from the following options: 0 -don’t know; 1 -definitely fortis; 2 -probably fortis; 4 -probably lenis; 5 -definitely lenis. Fifty usable sets of data were collected, all of which are provided by native English speakers.
Almost all the /θ/-final words have their choices grouped mainly from 2 to 3, with a few 1 and 4. This do not show distinct characteristics of what we expect from strong PFL influence (grouping from 3 to 4), but it does not deny the possibility of lenisisation either (with significant numbers of choice 3). /s/-final words show a similar pattern, with the word ‘house’ as an exception–choices groups mainly from 2 to 4, showing distinct trait of lenisisation. For the data of /f/-final words, over half of them within the PFL listpresent a clear distribution patternof lenisisation –choices spread mainly from 2 to 4, some even from 3 to 4. Judging by theresults, PFL no longer has dominant influence on previously documented PFL /θ/-final and /s/-finalwords, exceptfor‘house’. Even though PFL might not beas productive as it once was, it still has significant influence on /f/-final PFL words, half of which show the distinct characteristics of lenisisation.

Fatma Elsafory – University of the West of Scotland
Cyberbullying on social media is a threat, especially as gray social media platforms, those with a loose moderation policy on cyberbullying, have been attracting more people. Recently, data collected from these types of platforms have been used to pre-train word embeddings, yet these word embeddings have not been investigated for the task of cyberbullying detection. In this paper, we carried out a series of experiments as part of a comparative study on both slang-based and classic non-slang-based word embeddings to see how they represent potentially offensive terms, and how useful they are for detecting different types of cyberbullying. Our results show that slang-based word embeddings are better than the classic word embeddings at capturing more offensive words that are related to sensitive topics like race and gender, and they also significantly outperform classic embeddings at the task of cyberbullying detection. We further show that the difference between datasets that contain various types of cyberbullying exists in the form of subtle language rather than the mere presence of offensive words that are associated with these types of cyberbullying. Our results show that some word embeddings are better than others at classifying offensive words into categories defined by Hurtlex, but we do not find strong evidence that the same word embeddings will necessarily work best when identifying cyberbullying text that contains words from the same Hurtlex category.

Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation

Nina Markl – University of Edinburgh, UKRI CDT for NLP, Institute for Language, Cognition and Computation & Linguistics and English – nina.markl@ed.ac.uk
Catherine Lai – University of Edinburgh, Linguistics and English & Centre for Speech Technology Research
Abstract:
Commercial Automatic Speech Recognition (ASR) systems tend to show systemic predictive bias for marginalised speaker/user groups. We highlight the need for an interdisciplinary and context-sensitive approach to documenting this bias incorporating perspectives and methods from sociolinguistics, speech & language technology and human-computer interaction in the context of a case study. We argue evaluation of ASR systems should be disaggregated by speaker group, include qualitative error analysis, and consider user experience in a broader sociolinguistic and social context.

Modelling Suspense in Short Stories

David Wilmont – University of Edinburgh, Institute for Language, Cognition and Computation – david.wilmot@ed.ac.uk
Frank Keller – University of Edinburgh – keller@ed.ac.uk
Abstract:
Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling. While there is a vast theoretical literature on suspense, it is computationally not well understood. We compare two ways for modelling suspense: surprise, a backward-looking measure of how unexpected the current state is given the story so far; and uncertainty reduction, a forward-looking measure of how unexpected the continuation of the story is. Both can be computed either directly over story representations or over their probability distributions. We propose a hierarchical language model that encodes stories and computes surprise and uncertainty reduction. Evaluating against short stories annotated with human suspense judgements, we find that uncertainty reduction over representations is the best predictor, resulting in near human accuracy. We also show that uncertainty reduction can be used to predict suspenseful events in movie synopses.

Effects of second language proficiency and immersion on L2 English Ambiguous pronoun processing

Carine Abraham – University of Edinburgh, School of Psychology, Philosophy, and Language Sciences – c.abraham@ed.ac.uk
Abstract:
When reading and writing, interlocutors often encounter and rapidly process ambiguous referents using various comprehension strategies, such as mention order or event structure. One of these strategies is to use verbal aspect (VA), specifically perfective (PA) and imperfective (IA) aspect verbs (gave and giving respectively), to comprehend to whom interlocutors are referring. A study by Rohde, Kehler, and Elman (2006) found participants were biased to co-refer ambiguous pronouns to source referents in IA utterances while no referent bias was found for PA utterances. Although this processing strategy was found cross-linguistically, Grüter et al. (2017) could not find evidence of L2 speakers using VA as a comprehension strategy, and therefore proposed the Reduced Ability to Generate Expectations (RAGE) hypothesis to characterise L2 speakers. However, the participants were not highly proficient speaker of English (A1-B2[1]), which might be an indicator for generating less native-like expectation and more variability in L2 pronoun use (Sorace & Filiaci, 2006) and aspect processing (Bardovi-Harlig, 2000; Gabriele, 2009). To find out whether L2 expectation generation is affected by proficiency, Abraham (2017) tested more proficient L2 English speakers (averaged C2 English proficiency) and found a significant effect of VA in the L2 groups and a main effect of language proficiency within these groups. To further explore the effect of proficiency, and to account for a small sample size in Abraham (2017), the current study recruited monolingual English and L1 Spanish/L2 English speakers, who were tested with a Story Continuation task. For this task, participants were prompted with similar experimental sentences as Grüter et al. (2017), each including two characters of matching gender, a transfer-of-possession verb (e.g. handed) where aspect was manipulated, PA or IA, and were asked to write continuations beginning with either “He”, “She”, or no prompt. As the data for this experiment is being analysed, it is hypothesized that if proficiency affects the ability to use VA during comprehension, highly proficient L2 participants (C2 or higher) will have native-like continuations, otherwise, or participants will have no coreference preferences like Grüter et al.’s (2017) L2 participants.
[1] Ratings based on the Common European Framework of Reference for Languages: Learning, teaching, assessment levels (CEFR; Council of Europe, 2001).

To what extent do existing theories of borrowing, as posited in borrowing hierarchies, apply to French and Arabic in the Maghreb considering its diglossic situation?

Gerard Murphy – University of Edinburgh, Linguistics and English Language – G.M.Murphy@sms.ed.ac.uk
Abstract:
This project explores the contact-induced change between French and the two main varieties of Arabic in the North African Maghreb region, namely the Low (L) variety, known as Maghrebi Arabic, and the High (H) variety, Modern Standard Arabic (MSA). The project then compares the presence of borrowing in these varieties to two borrowing hierarchies, Thomason and Kaufman (1988) and Field (2002).
We first consider Maghrebi Arabic, analysing the lexical borrowings and structural borrowings from French, and applying them to the hierarchies. We then undertook the same procedure with MSA, mainly considering data from Sayahi (2014) and Mzoughi (2015).
Maghrebi Arabic shows to be consistent with the expected frequencies of lexical and structural borrowing as in the hierarchies, as well as the specified motivations for doing so; however, MSA borrows a range of structural features from French with very little lexical borrowing in order to fill gaps in the language, contradicting the borrowing frequencies and motivations as proposed in the hierarchies.
This ultimately highlights that these borrowing hierarchies apply well to the L variety in this diglossic speech community, but are not very applicable to the H variety, largely due to the lack of consideration for gaps as a motivation for borrowing, as well as the top-down institutional support for the H variety.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.