The 30th Language Lunch

Date: 2011-12-01

Location: G.07 Informatics Forum

Annotating Attribution Relations: Towards source sensitivity within text

Silvia,Pareti; informatics; s.pareti@sms.ed.ac.uk

“Efforts to extract attribution relations have multiplied in recent years, due to their relevance in particular for Opinion Analysis and Information Extraction applications. Being able to correctly identify the source (either a specific entity e.g. President Obama or a class thereof e.g. experts, official sources, rumours) of a piece of information or an opinion would be extremely beneficial. This would in fact enhance opinion-oriented applications of Language Technology and revolutionise the way we can select information, e.g. on the basis of source expertise and reliability. However, current approaches to the automatic extraction of attribution relations remain limited in scope and precision and are therefore not adequate to support the development of reliable applications. Moreover, there has been little or no attempt to identify relevant features of attribution (e.g. different type of sources, authorial stance) that affect the perception and interpretation of the attributed material. This study addresses several of the attribution strategies identified in Italian and English news corpora, employed to build a broad-coverage annotated resource, in order to develop a more comprehensive system for the extraction of attributions and their relevant features from news texts.”

The role of Foreigner-Directed Speech in the Cultural Transmission of Language & the resulting effects on Language Typology

Hannah,Little; PPLS; None

The current study presents a novel experiment which aims to bridge the gap between theoretical approaches and observed trends in language typology and evolution. Lupyan & Dale (2010) found that the bigger the population using a language, the more that language will encode functional items using lexical strategies. These correlations are hypothesised to be the result of larger language populations having more adult second language learners with different learning biases from first language learners, which may include preferring lexical over morphological strategies (Lupyan & Dale, 2010). Experimental work on the differences between adult and child learning however, has shown contradictory results (Hudson Kam & Newport, 2005,Hudson Kam & Newport, 2009). The current study seeks to demonstrate that foreigner-directed speech should be considered when explaining the typological correlations discussed above.rnrnThe experiment investigated whether interacting with a perceived foreigner would influence an interlocutor to adopt lexical over morphological strategies. Participants were trained on an artificial language. The language offered two ways of describing the scenes used in the experiment, either using a lexical and a morphological strategy. Participants were in one of two conditions, either the esoteric or exoteric condition, where they perceived their interlocutor as either an insider or outsider respectively. The frequency of lexical or morphological strategies used in a communication task was recorded. The results show that lexical strategies are adopted more by participants in the exoteric condition, but only if the first speaker in an interaction initially uses a lexical strategy. It is concluded that foreigner directed speech should be considered as a factor in the cultural evolution of language when seeking to explain trends in language typology.rnrnReferencesrnHudson Kam, C., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1, 151?195. rnHudson Kam, C., & Newport, E. L. (2009). Getting it right by getting it wrong: When learners change languages. Cognitive Psychology. rnLupan, G. & Dale, R (2010). Language structure is partly determined by social structure. PLoS ONE 5(1): e8559

Improving Pronoun Translation for Statistical Machine Translation (SMT)

Liane,Guillou; informatics; None

Machine Translation is a well-established field, yet the majority of current systems translate sentences in isolation, losing valuable contextual information from previously translated sentences in the discourse. One important type of contextual information concerns who or what it is that a coreferring pronoun corefers to (i.e., its antecedent).rnLanguages differ significantly in how they achieve coreference, and awareness of antecedents is important in making the right choice. Disregarding a pronoun’s antecedentrnin translation can lead to inappropriate coreferring forms in the target text, degrading a reader’s ability to understand it.rnThis work focusses on the translation of coreferring pronouns in English-Czech Statistical Machine Translation (SMT). I present an assessment of the effectiveness ofrnsource-language annotation for this purpose and highlight limitations with respect to currently available evaluation methods and resources.

A Treebank of Visual and Linguistic Data

Desmond,Elliott; informatics; d.elliott@sms.ed.ac.uk

Frank,Keller; informatics; keller@inf.ed.ac.uk

The treebank is a new resource for researchers working on the intersection between vision and language. It is intended to be a freely-available corpus of images and corresponding text for the development and evaluation of natural language generation, image annotation, and structure induction. It differs from existing datasets because it contains syntactic representations of the data, which makes it applicable to a wider range of tasks. The images are provided in their surface form, as a set of gold-standard object annotations, and as gold-standard visual dependency graphs derived from the annotations. The annotations are made {it with respect to} the corresponding text, which means they cover a wide range of object classes and are directly related to the image description. The visual dependency graphs are generated using a geometric dependency grammar, which defines how relations between pairs of objects can be generated. The text is provided in its surface form and as a syntactic dependency tree, which is produced by a state-of-the-art parser. The treebank currently contains several hundred completely annotated pairs of data.

Integrating cross-domain information in predictions

Ian,Finlayson; Queen Margaret University; None

It is now widely accepted that language comprehension involves prediction. Upon hearing eat in the sentence “the boy will eat the cake”, listeners are more likely to look toward an edible object than upon hearing a verb that does not impose this semantic restriction upon its theme, such as move (Altmann & Kamide, 1999). Using the visual world paradigm, we investigated the ability of listeners to predict phonological features of themes and to subsequently combine these with the predictions made from the semantic restrictions of verbs.rnParticipants were faster to initiate saccades towards the target when sentences contained a restrictive verb, and independently they looked toward the target quicker when the sentence contained a phonologically restrictive determiner (a) than when it did not (his). Our findings demonstrate that listeners’ predictions can be driven from integrated information from multiple linguistic domains.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.