Category: Uncategorised

We’re thrilled to announce the launch of Hidden Heritages, a brand-new website for accessing Scottish and Irish traditional tales. This resource is the outcome of an international, multi-year collaboration between universities in the UK, Ireland and USA, generously supported by the Arts and Humanities Research Council (AHRC) and the Irish Research Council (IRC).

Between 2021 and 2024, our team digitised and recognised thousands of tales from the School of Scottish Studies Archives (SSSA) and the Irish National Folklore Collection (NFC). The result? An online repository featuring 5,515 folktales, with more than 3,400 tales from Scotland and over 2,000 from Ireland.

An image showing the interactive map and list of folktales available on the Hidden Heritages website.

Search page for the Hidden Heritages website.

Behind the Scenes

Many of these tales date back centuries and were collected orally from tradition bearers up to the end of the 20th century. Ethnologists at the SSSA and NFC transcribed thousands of them from fieldwork recordings as handwritten manuscripts. In other cases, the tales came from printed books and articles. Using AI-powered text recognition (OCR for printed material and HTR for handwriting), the research team converted these documents to digital text and have now made them publicly accessible online, often for the very first time. Transkribus, the handwriting recognition tool, was instrumental for this work.

While the automatic transcriptions aren’t flawless, the accuracy is impressive, with less than 5% error rates. And because the original transcriptions often captured rich dialectal forms of Irish and Scottish Gaelic, these texts are a goldmine for linguistic and cultural analysis.

Image shows metadata, a PDF and AI-recognised text for the Scottish Gaelic tale 'Biast na[n] Naoi Ceann' ['Beast of the Nine Heads']

Viewing the texts and metadata for the Gaelic tale ‘Biast na[n] Naoi Ceann’ (‘Beast of the Nine Heads’)

Dall ort! Jump Right In!

Using the website couldn’t be simpler. Explore by searching for your favourite folktale themes. Perhaps you’re interested in tales of witches (buidsichean), giants (fuamhairean) or fairies (Na Daoine Beaga ‘The Wee Folk’). You can filter stories by date, tale-type, language, gender of collectors and narrators, and much more. Interactive visual maps will guide you straight to tales from particular places, letting you discover how stories travelled and evolved.

Users who don’t have Gaelic or Irish can copy and paste texts into Google Translate to get a rough English translation. For many of Scottish tales, we also include direct links to Tobar an Dualchais / Kist o Riches, where users can access the original fieldwork recordings from the School of Scottish Studies Archives. Irish readers can even help to improve some of the transcriptions, but following links to Meitheal Dúchas.

Whether you’re a researcher, folklore enthusiast or just curious about traditional tales, we invite you to explore, share and participate in this rich cultural legacy. Scottish material is provided for research purposes, and Irish stories are available under a Creative Commons licence, allowing non-commercial reuse — remember to attribute your sources carefully.

Visit www.hiddenheritages.ai and explore Gaelic storytelling now! And keep an eye out for our upcoming book, Decoding the Oral Traditions of Scotland and Ireland: From Manuscripts to Models, to be published by Edinburgh University Press in 2026.

Jun 9, 2025

Gaelic in the Digital Age: Inside the ÈIST Project

By wlamb

On 4th June 2025

In Greimeagan - Research bites, Naidheachdan - News, Uncategorised

Scottish Gaelic, spoken by roughly 60,000 people today, is poised for a technological transformation thanks to the ÈIST project, led by the University of Edinburgh. ÈIST [eːʃtʲ] (‘ayshch’) is short for Ecosystem for Interactive Speech Technologies, and means ‘listen’ in Gaelic. The project is funded by the Scottish Government and Bòrd na Gàidhlig, with key partners including the BBC ALBA, NVIDIA, the University of Glasgow and Tobar an Dualchais / Kist o Riches. It aims to support the revitalisation of the language through cutting-edge interactive technologies, including speech recognition.

Since launching in 2023, ÈIST has focussed on developing accurate speech-to-text for Gaelic, but also for English — to cope with code-switching. The initial aim was to produce a system that could generate Gaelic-medium subtitles for BBC ALBA and Radio nan Gàidheal. In a forthcoming paper, the team reports achieving nearly 90% accuracy for chat shows, news and current affairs programmes. Now the team is expanding the technology, making it suitable for more diverse contexts, ranging from Gaelic-speaking classrooms to old fieldwork recordings.

In the autumn of 2025, the creation of an accessible, robust API (Application Programming Interface) will democratise these tools further. Developers and researchers worldwide will gain access to Gaelic speech recognition, embedding the technology into applications ranging from educational software to digital assistants.

Bridging Linguistic Gaps

Currently, Gaelic speech recognition systems struggle to transcribe younger speakers, whose speech patterns often differ from those of heritage speakers. ÈIST addresses this with its dedicated subproject, Recognising Children’s Speech, which will soon gather data from Gaelic Medium Education schools and units across Scotland. This new data will ensure that the speech of young learners is represented accurately.

The implications of this work are profound. Enhanced speech recognition can transform educational tools, such as TextHelp’s Read&Write, providing more effective support for literacy and learning among children. It can also improve accessibility for Gaelic speakers of all ages, especially those with literacy challenges or hearing impairment, providing critical tools for communication and inclusion.

A Community-Powered Initiative

Central to ÈIST’s mission is Opening the Well, a pioneering crowdsourcing platform currently in development and due to launch in the last quarter of 2025. This initiative will mobilise the Gaelic-speaking community worldwide to transcribe audio recordings of traditional narratives and oral history held on the Tobar an Dualchais portal, such as those originally from the School of Scottish Studies Archives (University of Edinburgh). It takes its cue from Ireland’s successful Meitheal Dúchas crowdsourcing project. These transcriptions won’t just make Scottish heritage more accessible — they will also provide vital training data to enhance the accuracy and versatility of future speech recognition models.

The editing screen in Opening the Well

Risks, Rewards and Responsibility

The team behind ÈIST is keenly aware that speech and language technologies are not neutral tools. Poorly trained models can misrepresent language, distort culture or reinforce social bias. That is why ÈIST places community involvement at the heart of its design process.

The researchers also promote best practices for ethical AI use in revitalisation: curate transparent training data, return outputs to the community (e.g., through DASG, the Digital Archive of Scottish Gaelic), and avoid relying on ‘big tech’ to solve our problems, albeit poorly. It is a values-led approach as much as a technical one.

The Future of Gaelic in the Digital Age

Finally, ÈIST is developing an interactive, text-based interviewer chatbot. This promises not only to engage speakers in naturalistic conversation but also to generate essential data for future applications. When combined with speech technology, such a chatbot could assist with language learning, offering conversational practice previously difficult to achieve outside of a native-speaking community. While this will never replace human teachers, it could be a very useful adjunct to teaching and learning.

In a broader sense, ÈIST represents a powerful case study in how language technology can empower minority languages. The project’s blend of machine learning and cultural preservation illustrates how digital innovation can help to sustain diversity, rather than eroding it. For more information about ÈIST, contact w.lamb@ed.ac.uk

ÈIST Research Team

Prof William Lamb (PI, University of Edinburgh)
Dr Bea Alex (Co-I, University of Edinburgh)
Prof Peter Bell (Co-I, University of Edinburgh)
Ms Rachel Hosker (Co-I, University of Edinburgh)
Prof Roibeard Ó Maolalaigh (Co-I, University of Glasgow)
Dr Alison Diack (Transcriber, DASG, University of Glasgow):
Mr Cailean Gordon (Lead transcriber, Tobar an Dualchais / Kist o Riches):
Dr Ondřej Klejch (Speech processing specialist, Informatics, University of Edinburgh)
Dr Michal Měchura (Web designer and computational linguist)

Links

Prof Lamb’s inaugural lecture on AI and Scottish Gaelic
Full demonstration video: Subtitles for BBC ALBA’s ‘An Là’
A NotebookLM podcast unpicking the results from ÈIST’s new research paper (below)
arXiv version of Klejch et al. 2025. ‘A Practitioner’s Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic’.

Jun 4, 2025

Decoding Hidden Women: Feminist digitisation practices in the Tale Archive

By wlamb

On 23rd October 2023

In Uncategorised

By Catherine Banks

As the Decoding Hidden Heritages project is nearing the end of its digitisation and metadata collection stage, this is a good opportunity to share some insights from the project on the importance of archival work for the representation of women’s heritage. While the project’s main focus is on the narrative traditions of Scotland and Ireland, valuable information has also been discovered that has wider cultural implications, such as the influence of gender on narrative traditions. These discoveries have been made possible by the digitisation process because it has allowed a re-examination and re-documentation of the archive’s collection. As part of this process at the School of Scottish Studies Archives, I have been able to employ what Prof Melissa Terras terms feminist digitisation practices, which ‘are both an attitude, and an application of technology in an efficient way’.[1] She described this practice as ‘an act of owning women’s history, using digital means, to collate information and histories that the mainstream – for whatever reason – has not tackled’.[2] For this project, that has involved ensuring that women’s material in the archive is accessible to and discoverable by the public through digitisation and accurate metadata collection.

While digitising the Tale Archive I discovered several unique factors that affected women’s presence, or rather their absence, in the archive. In particular, I noticed distinct documentation issues with the archive’s material relating to women. The most significant of these issues was the erasure of women’s names in archival documents and metadata.

There are four distinctive scenarios in which women’s names have been erased:

The documents lack women’s first names.

The most common erasure of women’s names in the archive is the use of only women’s surnames, particularly their married surnames, for example Mrs. Stewart. In SSSA_TA_WT042_001 the informant is only listed as Bean Sheumais (‘Wife of James’). This is most likely because that was how these women would have given their names to the collectors, as was the social practice at the time.

Their husband’s full name is used in lieu of women’s names.

The next most common form of women’s names is their husband’s full name used as their married name, for example Mrs. John MacDonald. In some cases, married women’s first names have been discovered and their full names are included in the metadata. For example, Mrs. Hugh Milne has been recorded as Bella Milne in the project database.

The influence of gender on the documentation of names in these records is made clear in SSSA_TA_GH013_001. The metadata for this transcription records the informant as ‘Andrew Stewart and family’ but the document itself listed it as Mrs. Andy Stewart. Despite the fact that this story is told by Mrs. Stewart about her own experience with a ghost, the metadata recorded her husband as the main informant, erasing Mrs. Stewarts’ ownership over her story. When her husband and son interject into her story, the transcript states ‘Carol Stewart, their son, takes over’ and ‘Andrew takes over’ but, rather than use her full name, it says that the ‘story returns home to Mrs. Stewart’. Each of the male members of the Stewart family have their full names recorded while Mrs. Stewart does not. As a result of re-examining this material, the metadata has been corrected and Mrs. Stewart’s story is now properly recognised in the archive.

The names are unrecorded.

In much of the material, women have shared their stories anonymously. This makes it impossible to document who they are. Women are often referred to as ‘girls’ such as ‘Barra Girl’ (SSSA_TA_GH002_002) or ‘a girl who was native of Glenurqhart’ (SSSA_TA_WT043_002) without their names recorded. Yet, even in these cases it is still important to document the informant’s gender in the metadata. For example, one informant was listed as a ‘Native of Lochcarron’ in SSSA_TA_WT037_015. However, by reading their story it can be ascertained this person was a woman, because she states, ‘when they sent me … I was a young girl at the time’. By documenting their gender in the metadata, at least we are able to accurately acknowledge these women’s presence in the archive.

They are not named in the archive’s metadata but are present in documents.

One of the most significant examples of a woman’s erasure from the archive is SSSA_TA_FL025. This document and its metadata records Walter Johnson as the informant of a transcription. However, the transcription is actually of Bella Higgins telling her personal experience of meeting a fairy [Ed. noted here using the dated and offensive term ‘golliwog’]. Even though it is only Bella speaking, her story had been attributed to Walter Johnson. As a consequence of this incorrect documentation, her voice had been hidden in the archive.

Similarly, in a series of transcriptions by John Stewart and his wife Maggie Stewart (SSSA_TA_GH001_022, 23, 25), John was recorded as the only informant. Even though Maggie was present in them as well, her contributions to their stories were unrecognised. As a result of the careful examination of these documents while they were digitised, these women’s contributions were uncovered and are now appropriately documented in the collection’s metadata.

While in some cases these documentation issues may seem small, they have significant consequences. Women’s names being unrecorded or partially recorded in the archives makes tracing women’s histories and family lineages extremely difficult and often impossible. For example, it is impossible to ascertain from the documentation whether a woman recorded as Mrs. MacDonald is the grandmother, mother, wife or sister-in-law to Mr. John Macdonald because all these women would have been referred to identically. Similarly, when women have no name recorded at all, their contributions to the archive are unidentifiable.

The exclusion of these women misrepresents the material within our archives, presenting the collection as more male dominated than it is. Not only is their re-inclusion into the archive’s metadata important as an act of justice for these women, but it also enriches and expands the historical research and data that can be produced from the archive. As historians Andrew Flinn, Mary Stevens, and Elizabeth Shepherd have argued, ‘the archives that are “chosen” for survival, the terms in which they are described, and the processes by which these decisions are made, do ultimately impact on the collective memory and public histories that are produced from them’.[3]

This is particularly important in the context of the increasing trend in historical research, where historians seek to write women who have been hidden in accounts back into history. A recent example of this is a biography of George Orwell’s wife, ‘Wifedom: Mrs. Orwell’s Invisible Life’ by Anna Funder. She points out that in Orwell’s novel Homage to Catalonia, written while Orwell and his wife were in Spain, he ‘mentions “my wife” 37 times but never once names her. No character can come to life without a name’.[4] However, Funder was able to reconstruct the life of Eileen when she went ‘back to the biographers’ footnotes and sources and into the archives and found details that had been left out. Eileen began to come to life’.[5] Thus, there is immense value in archival sources which is still being discovered today and archivists play a vital role in ensuring that women’s history in these archives does not remain hidden. It is therefore important to seize the opportunity that digitisation projects such as this present to employ feminist digitisation practices on archival collections to uncover women’s hidden histories and ensure their posterity for the future.

The DHH team would like to thank Catherine for her important and timely blog and her excellent contributions to the project.

Bibliography

Flinn, Andrew, Mary Stevens, and Elizabeth Shepherd. “Whose Memories, Whose Archives? Independent Community Archives, Autonomy and the Mainstream”. Archival Science 9, no. 1-2 (2009)

Funder, Anna. “Looking for Eileen: how George Orwell wrote his wife out of his story”. The Guardian, 30 July 2023. Accessed 4 October 2023. https://www.theguardian.com/books/2023/jul/30/my-hunt-for-eileen-george-orwell-erased-wife-anna-funder.

Melissa Terras. “Interview With Professor Melissa Terras On Feminist Digitisation Practices And The Future Of Our Digital Cultural Heritage”. The University Of Edinburgh Futures Institute, 6 January 2023. Accessed 4 October 2023. https://efi.ed.ac.uk/interview-with-professor-melissa-terras-on-feminist-digitisation-practices-and-the-future-of-our-digital-cultural-heritage/.

Links to images

https://www.tobarandualchais.co.uk/person/1864?l=en

https://www.tobarandualchais.co.uk/person/639?l=en

Footnotes

[1] Melissa Terras, “Interview With Professor Melissa Terras On Feminist Digitisation Practices And The Future Of Our Digital Cultural Heritage”, The University Of Edinburgh Futures Institute, 6 January 2023, accessed 4 October 2023. https://efi.ed.ac.uk/interview-with-professor-melissa-terras-on-feminist-digitisation-practices-and-the-future-of-our-digital-cultural-heritage/.

[2] Ibid.

[3] Andrew Flinn, Mary Stevens, and Elizabeth Shepherd, “Whose Memories, Whose Archives? Independent Community Archives, Autonomy and the Mainstream”, Archival Science 9, no. 1-2 (2009): 76.

[4] Anna Funder, “Looking for Eileen: how George Orwell wrote his wife out of his story”, The Guardian, 30 July 2023, accessed 4 October 2023. https://www.theguardian.com/books/2023/jul/30/my-hunt-for-eileen-george-orwell-erased-wife-anna-funder.

[5] Ibid.

Oct 23, 2023

The Secret of Heather Ale (Fìon an Fhraoich)

By Cristina Horvath

On 24th August 2022

In Greimeagan - Research bites, Uncategorised

“Heather (Calluna vulgaris)”

The ‘secret’ of making heather ale has been a popular folktale in Scotland, with claims that the brewing of it dates back to ancient times.

I came across a few references to it while digitizing the ATU index cards in the SSSA’s Tale Archive.

Read the full Gaelic version from Calum Maclean’s collection of Fìion an Fhraoich (IFC MS 1028, pp. 103-105).

Accounts differ, as to whether it was the Vikings or ‘The Pechs’ that held the secret to making the ale, but the similar vein that runs through them is that eventually there were only two people in the world who held the secret: a father and his son. When they were forced to disclose their secret, the father claimed he would share the recipe, but only if his son was killed first. This request was followed through and the father then exclaimed that he had lied – he never intended to share the recipe, but believed that his son would have, being weaker than himself, and therefore had him killed to protect the secret forever!

You can read a full version of this story in English in Robert Chambers’ Popular Rhymes of Scotland. The story starts with this charming description of the Pech people:

“LONG ago there were people in this country called the Pechs; short wee men they were, wi’ red hair, and long arms, and feet sae braid, that when it rained they could turn them up owre their heads, and then they served for umbrellas.”

Listen to a 1961 version on the Tobar an Dualchais website by John Jamieson Irvine, recorded by School of Scottish Studies fieldworker, Elizabeth Sinclair.

Another wonderful reference to this story is the 1890 poem “Heather Ale” by Robert Louis Stevenson.

From the bonny bells of heather
They brewed a drink long-syne,
Was sweeter far than honey,
Was stronger far than wine.
They brewed it and they drank it,
And lay in a blessed swound
For days and days together
In their dwellings underground.

There rose a king in Scotland,
A fell man to his foes,
He smote the Picts in battle,
He hunted them like roes.
Over miles of the red mountain
He hunted as they fled,
And strewed the dwarfish bodies
Of the dying and the dead.

Summer came in the country,
Red was the heather bell;
But the manner of the brewing
Was none alive to tell.
In graves that were like children’s
On many a mountain head,
The Brewsters of the Heather
Lay numbered with the dead.

The king in the red moorland
Rode on a summer’s day;
And the bees hummed, and the curlews
Cried beside the way.
The king rode, and was angry,
Black was his brow and pale,
To rule in a land of heather
And lack the Heather Ale.

It fortuned that his vassals,
Riding free on the heath,
Came on a stone that was fallen
And vermin hid beneath.

Rudely plucked from their hiding,
Never a word they spoke:
A son and his aged father—
Last of the dwarfish folk.

The king sat high on his charger,
He looked on the little men;
And the dwarfish and swarthy couple
Looked at the king again.
Down by the shore he had them;
And there on the giddy brink—
“I will give you life, ye vermin,
For the secret of the drink.”

There stood the son and father
And they looked high and low;
The heather was red around them,
The sea rumbled below.
And up and spoke the father,
Shrill was his voice to hear:
“I have a word in private,
A word for the royal ear.

“Life is dear to the aged,
And honour a little thing;
I would gladly sell the secret,”
Quoth the Pict to the King.
His voice was small as a sparrow’s,
And shrill and wonderful clear:
“I would gladly sell my secret,
Only my son I fear.

“For life is a little matter,
And death is nought to the young;
And I dare not sell my honour
Under the eye of my son.
Take him, O king, and bind him,
And cast him far in the deep;
And it’s I will tell the secret
That I have sworn to keep.”

They took the son and bound him,
Neck and heels in a thong,
And a lad took him and swung him,
And flung him far and strong,
And the sea swallowed his body,
Like that of a child of ten;—
And there on the cliff stood the father,
Last of the dwarfish men.

“True was the word I told you:
Only my son I feared;
For I doubt the sapling courage
That goes without the beard.
But now in vain is the torture,
Fire shall never avail:
Here dies in my bosom
The secret of Heather Ale.”

Nowadays, in the age of the internet and search engines, the guarding of a recipe to the death seems quite ridiculous! Also, to our modern sensibilities allowing a tradition to die out and not be preserved in some form or another would be almost unthinkable. The School of Scottish Studies Archives, and others like it around the world, exist to collect, preserve and share oral and written tradition. This work is very important, and crucial to our understanding of the past, present and future. The world is a much richer place for it!

If you’d like to try making your own heather ale, check out this recipe. Let us know how it turns out!

Bibliography and Further Reading

Calum Maclean Collection (ed.ac.uk)

Tobar an Dualchais

Heather Ale by Robert Louis Stevenson (thoughtco.com)

The Legend of Heather Ale – Folklore Scotland

“The Secret of the Heather Ale” Munro, Neil. The Lost Pibroch: And Other Sheiling Stories. United Kingdom: W. Blackwood, 1896.

The Popular Rhymes of Scotland, with illustr., collected by R. Chambers. United Kingdom: n.p., 1869.

Aug 24, 2022

“Magic Flight”: A Mi’kmaq Tale

By Cristina Horvath

On 24th March 2022

In Greimeagan - Research bites, Uncategorised

There are 28 versions of Aarne–Thompson–Uther (ATU) Index tale type 313 at the School of Scottish Studies Archives, but this particular one stands out. It is a tale told by Isabel Morris Googoo from the Mi’kmaq (or Micmac) tribe in Whycocomagh, Nova Scotia, to folklorist Elsie Clews Parsons in 1923. It was originally published in the Journal of American Folklore in 1925, but the copy we have on file is from a 1986 edition of the Cape Breton Magazine, with added illustrations of Mi’kmaq petroglyphs from a publication by the Nova Scotia Museum. The article notes that “it is an example of elements of European stories and religion that have been worked into Micmac tradition.” In this part of Canada, this European influence would have come more specifically from Scottish and French settlers, although this tale type has variations that can be found across the globe. It even has ties to Ancient Greek mythology. In Scots, it is best known as Nicht, Nought, Nothing collected by Andrew Lang from “an aged old lady in Morayshire” (In Lang’s words). Unfortunately, the lady is not named as is too often the case with female narrators, and actually what makes the Mi’kmaq Magic Flight story so interesting is that it was told and collected by women, and they are specifically named. In the Journal of American Folklore article, we are even given the names of the source of the story: Googoo’s grandmother, Mary Doucet Newell. The collector, Elsie Clews Parsons, was one of the earliest figures for the feminist movement and was outspoken on the negative effects of gender role expectations, publishing works on the topic in the early 20th century.

An Irish version of the Magic Flight tale, also collected by a woman, can be read on the Duchas website here.

The Mi’kmaq Magic Flight tale from the Cape Breton Magazine is attached here in its entirety. Note the adverts, providing a wonderful glimpse into the social history of 1980’s Nova Scotia!

A Micmac Tale – Magic Flight

Bibliography:

Parsons, Elsie Clews. “Micmac Folklore.” The Journal of American Folklore 38, no. 147 (1925): 55–133. https://doi.org/10.2307/534961. *Warning: this article contains some offensive language*

Peverill, L.., Robertson, M.. Rock Drawings of the Micmac Indians. Petroglyphs. N.p.: n.p., 1973.

Cape Breton’s magazine. 1972. Wreck Cove, N.S.: R. Caplan (Edition no. 41, 1986).

Lang, Andrew. Custom and Myth. United States: Harper & brothers, 1893.

Mar 24, 2022

Decoding Hidden Heritages project update: 14.01.22

By wlamb

On 14th January 2022

In Uncategorised

For an automatic translation into English, click here. For a version in Irish, click here.

15 Am Faoilleach 2022

Ùghdar: Dr Andrea Palandri, Rannsaiche Iar-Dhotaireil

Andrea Palandri

As t-samhradh 2021, fhuair Gaois maoineachadh fo sgeama AHRC-IRC gus pròiseact a thòiseachadh air a’ Phrìomh Chruinneachadh Làmh-sgrìobhainnean bho thasg-lann Coimisean Beul-aithris na h-Èireann (Cumann Béaloideasa Éireann, University College Dublin). Canar Decoding Hidden Heritages ris a’ phròiseact seo. Is e cuspair a’ bhlog seo an obair dhigiteachaidh a tha a’ dol air adhart mar phàirt den phròiseact air làmh-sgrìobhainnean a’ Phrìomh Chruinneachaidh.

Thathas a’ meas gu bheil timcheall air 700,000 duilleag làmh-sgrìobhainn anns a’ Phrìomh Chruinneachadh Làmh-sgrìobhainnean, ga fhàgail mar aon de na cruinneachaidhean as motha de stuth beul-aithris air taobh an iar na Roinn Eòrpa. Bhiodh seo air a bhith na dhùbhlan mòr airson digiteachadh mura biodh Transkribus air teicneòlas AI airson aithne làmh-sgrìobhaidh a leasachadh thar nam beagan bhliadhnaichean a dh’fhalbh. Tha Decoding Hidden Heritages gu mòr an urra air an teicneòlas seo agus leigidh e leis a’ phròiseact a innealan-aithne làmh-sgrìobhaidh fhèin a dhèanamh stèidhichte air sgrìobhadairean sònraichte sa chruinneachadh.

On a thòisich ar luchd-rannsachaidh a bhith ag obair leis a’ bhathar-bog Transkribus tràth san Dàmhair, tha sinn air trì innealan làmh-sgrìobhaidh aithnichte a dhèanamh a tha ag obair aig ìre mionaideachd nas àirde na 95%: aon airson Seosamh Ó Dálaigh, aon airson Seán Ó hEochaidh agus aon airson Liam Mac Coisdealbha, trì de an luchd-cruinneachaidh as dealasaiche a bha ag obair don Choimisean.

Figear 1 (Clí) Seosamh Ó Dálaigh a’ cruinneachadh beul-aithris bho Tomás Mac Gearailt (Paraiste Márthain, Corca Dhuibhne) agus (deas) làmh-sgrìobhainn a sgrìobh e bho chlàradh a rinn e de Tadhg Ó Guithín (Baile na hAbha, Dún Chaoin, Corca Dhuibhne) ga ath-sgrìobh ann an Transkribus.

Tha Transkribus feumail air tar-sgrìobhadh ceart a rèir duilleag na làmh-sgrìobhainne – a rèir làmh-sgrìobhadh agus dual-chainnt an neach-cruinneachaidh – gus an einnsean a thrèanadh. An dèidh a bhith ag aithneachadh timcheall air leth-cheud duilleag san dòigh seo, thrèan sinn modal làmh-sgrìobhaidh aig ìre gu math èifeachdach (90% +). Is e dòigh-obrach a’ phròiseict na dhèidh seo ath-sgrìobhadh a dhèanamh air àireamh mhòr de dhuilleagan gu fèin-ghluasadach agus luchd-taic rannsachaidh (Emma McGee, Kate Ní Ghallchóir agus Róisín Byrne) a chur gan ceartachadh mean air mhean. Na dhèidh sin, faodaidh sinn na modailean a dh’ath-thrèanadh air stòr-dàta nas fharsainge gus modalan cànain nas fheàrr (~ 95%) a fhaighinn. Tha toraidhean eadar-amail na h-obrach seo a’ toirt dòchas dhuinn gum bi e comasach don phròiseact ìre mionaideachd nas àirde a choileanadh anns na mìosan a tha romhainn, a leigeas leinn a bhith ag ath-sgrìobhadh gu fèin-ghluasadach mòran den Phrìomh Chruinneachadh Làmh-sgrìobhainnean cha mhòr thar oidhche.

Figear 2 An lúb ionnsachaidh de mhodalan cànain a chaidh a dhèanamh le Transkirbus gu ruige seo: Seán Ó Dálaigh (clí), Seán Ó hEochaidh (meadhan) agus Liam Mac Coisdealbha (deas).

Tha làmh-sgrìobhainnean a’ Phrìomh Chruinneachaidh am measg nan teacsaichean as motha anns a bheil lorg nan dual-chainntean ann an corpas litreachas Gaeilge an latha an-diugh. Is e dòigh-obrach agus dòighean deasachaidh Shéamuis Ó Duilearga fhèin a tha a’ nochdadh ann an Leabhar Sheáin Í Chonaill. Bhrosnaich agus stèidhidh e Comann Beul-aithris na hÈireann ann an 1927 agus chan eil mìneachadh nas fheàrr air an dòigh-obrach seo na na faclan a sgrìobh Séamus Ó Duilearga fhèin ann an ro-ràdh an leabhair:

Ní raibh ionnam ach úirlis sgríte don tseanachaí: níor atharuíos siolla dá nduairt sé, ach gach aon ní a sgrí chô maith agus d’fhéadfainn é.

Cha robh annam ach inneal sgrìobhaidh dhan t-seanchaidh: cha do dh’atharraich mi lide dhe na thuirt e, ach sgrìobh e a h-uile rud cho math ’s a b’ urrainn dhomh.

(S. Ó Duilearga, Leabhar Sheáin Í Chonaill, xxiv)

Cha deach mòran leabhraichean fhoillseachadh ann an litreachas na Gaeilge bhon uairsin a dh’fhuirich cho dìleas ri dual-chainnt an neach-labhairt ’s a rinn Leabhar Sheáin Í Chonaill: tha cruthan dualchainnteach mar bheadh saé an àite bheadh sé (bhiodh e), no buaileav an àite buaileadh (chaidh a bhualadh) no fáilthiú an àite fáiltiú (fàilteachadh). Mar sin, tha cànan nan làmh-sgrìobhainnean anns a’ Phrìomh Chruinneachadh a’ taisbeanadh dual-chainnt, no eadhon ideo-chainnt, an luchd-fiosrachaidh gu làidir. Mar eisimpleir, bidh claonadh dual-chainnte, do raibh an àite go raibh (gun robh) ga ràdh; bha sin aig cuid de dhaoine à Corca Dhuibhne ann an Chonntaidh Chiarraí, m.e. anns na sgeulachdan a sgrìobh Seosamh Ó Dálaigh bho Thadhg Ó Guithín (Baile na hAbha, Dún Chaoin).

Figear 3 Thug Diarmuid Ó Sé iomradh air an iongantas dualchaint seo ann an Gaeilge Chorca Dhuibhne (§619)

Tha làmh-sgrìobhainnean a’ chruinneachaidh seo car neònach air sàillibh nan cruthan beaga dual-chainnteach a chlàraich an luchd-cruinneachaidh fhad ’s a bha iad gan ath-sgrìobhadh. Is ann air sgàth an iomadachd cànain seo anns a’ chorpas nach eil am pròiseact ag amas air aon mhodail mòr a chruthachadh gus an Cruinneachaidh ath-sgrìobhadh air fad. A bharrachd air sin, chan e a-mhàin gu bheil sinn a’ dèiligeadh ri diofar dhual-chainntean ach tha sinn cuideachd a’ dèiligeadh ri diofar luchd-cruinneachaidh aig nach robh làmh-sgrìobhadh is litreachadh dhual-chainntean co-ionnan. Tha na duilgheadasan seo a’ fàgail gu bheil an corpas Gaeilge seo gu math measgaichte. Feumar dèiligeadh ris le cùram agus le taic bho leabhraichean dhual-chànanachais a bhios a’ toirt cunntas air na puingean beaga cànain a gheibhear ann.

Figear 4 Làmh-sgrìobhadh Sheosaimh Uí Dhálaigh

Figear 5 Làmh-sgrìobhadh Sheáin Uí Eochaidh

Figear 6 Làmh-sgrìobhadh Liam Mhic Choisdealbha

Jan 14, 2022

Agallamh le Roibeart MacThòmais / An interview with Robert Thomas

By wlamb

On 4th October 2021

In Agallamhan - Interviews, Uncategorised

Anns an t-sreath seo, tha sinn a’ toirt sùil air laoich a rinn adhartas cudromach ann an teicneolas nan cànanan Gàidhealach. Airson a’ cheathramh agallaimh, cluinnidh sinn bho Roibeart MacThòmais. Coltach ri Lucy Evans, the Rob air ùr thighinn gu saoghal na Gàidhlig. Chaidh fhastadh airson còig mìosan ann an 2021 mar phàirt de phròiseact a mhaoinich Data-Driven Innovations (DDI), far a robh an sgioba a’ cruthachadh teicneolas aithneachadh labhairt airson na Gàidhlig. Dh’obraich Rob air inneal coimpiutaireachd ùr-nòsach eile, An Gocair.

Nuair a bhios tu a’ feuchainn ri teicneòlas cànain a chruthachadh airson mhion-chànain, ’s e an trioblaid as bunasaiche ach dìth dàta. Chan eil an suidheachadh a thaobh na Gàidhlig buileach cho truagh ri cuid a mhion-chànanan eile, ach tha deagh chuid dhen dàta seann-fhasanta a thaobh dhòighean-sgrìobhaidh. Tha sin a’ fàgail nach gabh e cleachdadh gus modailean Artificial Intelligence a thrèanadh gun a bhith a’ cosg airgead mòr air ath-litreachadh.

Bidh An Gocair ag ath-litreachadh theacsaichean gu fèin-obrachail – tha e glè choltach ri dearbhadair-litrichidh. Chan eil ann ach ro-shamhla (prototype) an-dràsta agus tha sinn a’ sireadh taic a bharrachd airson a leasachadh. Aon uair ‘s gum bi e deiseil, b’ urrainnear a chur gu feum ann an iomadach suidheachadh, leithid foillseachadh, foghlam aig gach ìre, prògraman coimpiutaireachd eile agus rannsachadh sgoileireil. Cuiridh e gu mòr cuideachd ri pròiseact rannsachaidh ùr a tha a’ tòiseachadh an dràsta eadar còig oilthighean ann am Breatainn, Ameireaga agus Èirinn: ‘Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-mining and Phylogenetics’.

In this interview series, we are looking at individuals who have significantly advanced the field of Gaelic, Irish and Manx language technology. For the fourth interview, we hear from Mr Rob Thomas. Like Lucy Evans, whom we interviewed a few months ago, Rob has come to the world of Gaelic language technology only recently. He was chosen from a strong field to work with us on project funded by Data-Driven Innovations (DDI), in which we were developing the world’s first automatic speech recogniser for Scottish Gaelic. Rob worked on an important strand of this project – developing a brand-new piece of software called An Gocair.

When trying to develop language technology for minority languages, the most fundamental problem is data sparsity. The situation for Gaelic is not as dire as for some other minority languages, but much of the textual data available is outdated in terms of orthography. That makes it impossible to train machine learning models – at least without spending a lot of money on editing spelling.

An Gocair re-spells texts automatically – it’s basically an unsupervised spell-checker with some extra bells and whistles. It is currently only a prototype, however, and we are seeking additional support for its development. Once completed, it will be able to be used in a wide range of contexts, including publishing, education at all levels, as part of other computer programs and within academic research. It will also make a significant contribution to a new research project currently underway between five universities in Britain, America and Ireland: ‘Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-mining and Phylogenetics’.

Interview with Rob Thomas

Agallamh le Roibeart MacThòmais

Tell us a little bit about your background. For instance, where are you from, and what got you into language technology work?

Hello! I’m from a small town in South Wales called Monmouth. I grew up mostly in the countryside, quite far from civilisation. My interest in linguistics probably stems from having a fantastic English teacher in my high school. (Shout out to Mr Jones.) I don’t know if it was the content or how he taught it, but I remember at the time really enjoying the subject and his lessons.

Rob Thomas

I went on to study English Language and Linguistics at the University of Portsmouth. After graduating, I worked for a while at Marks and Spencer as I was not yet sure what kind of career I was looking for. Still kind of directionless, I spent a year and a bit traveling and on return began working in tech support. I managed to find a course in Language Technology at the University of Gothenburg, I had recently found a new interest in programming and this was a great way to merge my new interest and my academic foundation. After a few years living, studying and working in Sweden, I returned to the UK and began the job hunt and was lucky to find the position at the University of Edinburgh.

You mention studying language technology at the University of Gothenburg. What did you find most interesting about the course? Do you have any advice for someone who is thinking about studying language technology?

The course was fascinating and it attracted students from quite a broad background. The first meeting was like The Time Machine by H.G Wells: we were all introduced as the linguist or the mathematician, cognitive scientist, computer scientist, philosopher etc. I think what stood out is that language technology, as a field, relies on input and experience from a multitude of academical backgrounds. This is due to the complex nature of language. I think I would advise anyone who is not from a technical or STEM background to think about how important your knowledge and perspective is for the future of language-based AIs, systems and services. But if, like me, you do come from a humanities background be prepared to dive straight back in to the maths that you thought you managed to escape after you completed your GCSEs.

You are developing a tool for Scottish Gaelic that automatically corrects misspelled words and makes text conform to a Gaelic orthographical standard. That’s impressive for someone with Gaelic, and even more so for someone who doesn’t speak it. How did you manage to do this?

I am quite lucky to be supported by Gaelic linguists and other programmers. I found a way to integrate Am Faclair Beag, an online Gaelic dictionary developed by our resident Gaelic domain expert, Michael Bauer. Alongside the dictionary we translated complicated linguistic rules into something a computer could understand. We have managed to develop a program that takes a text and, line by line, attempts to identify spelling that don’t belong to the modern orthography and searches for the right word from our dictionary. If it has no luck, it then attempts to resolve the issue algorithmically. From the start I knew it was important that I was able to compare the program’s output to work done by Gaelic experts so that I could see whether I was improving the tool or just breaking it.

An Gocair

Since you’ve been born, you’ve seen language technology change and permeate how we work and live. What’s been your own experience of the changes that it has brought?

It has been very interesting witnessing the exponential growth of language technology in the mainstream. It wasn’t until I studied it that I realised how much it was already embedded in websites and services that I’ve been using for years. The more visible applications such as smart assistants are becoming much more normalised in our society. Even my grandma uses her smart assistant to turn on classic FM and put on timers which I think is really cool. My grandma is pretty tech savvy to be fair!

With the dominance of world languages in mass media and on the internet, some would say that technology is an existential threat to minority languages like Gaelic and Welsh. What do you think about this? Are there ways for minority languages to survive or even thrive today?

I think one of the issues in language technology is that most of the work is dedicated to languages that already have huge amounts of resources, for example English. Most of the breakthroughs are being made by large companies that ultimately aim to increase the value of their services. There are a lot of companies that sell language technology as a service (e.g. machine translation) rather than serving communities per se. The latter may not have direct monetary value, but it’s essential to keep that focus in order to allow minority languages to gain access to state-of-the-art technology.

What are your predications for language technology in the year 2050? If you had your own way, what would you like to see by that time?

I imagine smart assistants will be present in more spaces in society, perhaps even in a more official capacity. The county council in Monmouthshire already use a smart chatbot for questions about what days your bins are being collected. Imagine if they were given greater powers such as being able to make important decisions (scary thought). The more time goes on, the more I think we are going to end up with malevolent AIs like HAL from 2001, Space Odyssey, rather than ones like C3PO from Star Wars.

I’m not sure what I would like to see. It would be nice if there was more community-developed and open-source alternatives to what the main large tech companies provide, so a consumer would be able to be sure their data was being used in a safe and respectable way.

Oct 4, 2021

Rannsachadh digiteach air a' Ghàidhlig ~ Goireasan digiteach airson nan Gàidheal

Category: Uncategorised

Launching a Treasure Trove of Gaelic Folktales: Welcome to Hidden Heritages!

Behind the Scenes

Dall ort! Jump Right In!

Gaelic in the Digital Age: Inside the ÈIST Project

Decoding Hidden Women: Feminist digitisation practices in the Tale Archive

Bibliography

Links to images

Footnotes

The Secret of Heather Ale (Fìon an Fhraoich)

“Magic Flight”: A Mi’kmaq Tale

Decoding Hidden Heritages project update: 14.01.22

15 Am Faoilleach 2022

Agallamh le Roibeart MacThòmais / An interview with Robert Thomas

Interview with Rob Thomas

Agallamh le Roibeart MacThòmais

Tell us a little bit about your background. For instance, where are you from, and what got you into language technology work?

You mention studying language technology at the University of Gothenburg. What did you find most interesting about the course? Do you have any advice for someone who is thinking about studying language technology?

You are developing a tool for Scottish Gaelic that automatically corrects misspelled words and makes text conform to a Gaelic orthographical standard. That’s impressive for someone with Gaelic, and even more so for someone who doesn’t speak it. How did you manage to do this?

Since you’ve been born, you’ve seen language technology change and permeate how we work and live. What’s been your own experience of the changes that it has brought?

With the dominance of world languages in mass media and on the internet, some would say that technology is an existential threat to minority languages like Gaelic and Welsh. What do you think about this? Are there ways for minority languages to survive or even thrive today?

What are your predications for language technology in the year 2050? If you had your own way, what would you like to see by that time?

Report this page