With funding from UoE Challenge Investment Fund (Aug 2019), a small team of us have been busy developing the first handwriting recogniser for Scottish Gaelic. To do this, we have used Transkribus, a sophisticated, machine-learning based platform and on-line text repository.
The work began with the Digital Imaging Unit scanning about 2500 pages of handwritten manuscripts from the School of Scottish Studies Archives, supplemented by some additional scanning at the Centre for Research Collections.
Once we received the texts, research assistant Michael Bauer manually transcribed about 18,000 words, which we used to generate our first Gaelic handwriting model. This achieved an impressive Character Error Rate (CER) of 2.53% – accuracy about 97.5%, but this was developed from and tested on one writer’s hand. We used this model to help transcribe a further 18,000 words and trained a second model. Again, this involved only one hand, but achieved a CER of 1.90%.
Using the updated model, we are moving towards our target of 500k words. We have focussed the transcription efforts recently on increasing the number of hands involved, so that our next model is more generalisable and useful. The project will finish in July 2020, when we intend to make the Gaelic handwriting recogniser available to the public through Transkribus.
Project team
Dr William Lamb (PI): Celtic and Scottish Studies, LLC
Dr Beatrice Alex (Co-I): Edinburgh Futures Institute and LLC
Prof James Loxley (Co-I): English Literature, LLC
Dr Mark Sinclair (Consultant): Centre for Speech Technology Research (CSTR)
Mag Dr Muehlberger (Advisor): Transkribus (Innsbruck University)
Mr Michael Bauer (Research Assistant): Akerbeltz