Correcting Academic Language with AI

We’ve been really lucky to work with two amazing recent graduates (Martina Emmerich & Zoë Brunner) from the Natural Language Processing Unit in Informatics as part of our Captioning Service Project, who’ve been looking at how we might be able to use GenAI to improve the accuracy of automated transcripts and captions on lecture recordings and other academic media. It’s been a delight to work with them both. They’ve, alas, both moved onto shiny new jobs now but here’s a summary of what they’ve been up to, written by Martina before she left. It’s been an absolutely pleasure to work with both Zoë & Martina!

Photo of Martina

When I joined the project, I was wrapping up my work as a Research Assistant at the University on a different project that focused more on Natural Language Processing than Speech Recognition. Around that time, a colleague (Zoë Brunner) who was working on the Caption Correction Project was preparing to transition to a new role. This project offered an exciting opportunity to put into practice key concepts I had learned during my MSc, particularly in the areas of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP). I drew on my foundational understanding of classical ASR techniques, such as alignment using edit distance, to prepare the data. Additionally, my knowledge of advanced NLP models proved invaluable when designing and applying prompting strategies for ChatGPT to improve the captions’ accuracy.

Building on this, I applied knowledge from my studies on neural networks to explore contextual biasing as an alternative approach to ChatGPT prompting. Overall, the project not only encompassed many of the topics I studied during my Masters but also allowed me to deepen my expertise and explore new applications of what I had learned.

After navigating through three different codebases, I was relieved and excited to discover that the foundational implementation from my colleague was already capable of producing promising results on some of the lecture captions we were correcting. It became clear that this project had real potential, not just to accelerate the process of correcting automatic lecture captions but also to significantly enhance the overall accuracy of transcripts. Seeing this early promise made me eager to explore how much further we could push those improvements and optimize the system.

One of the biggest challenges I faced was unravelling the logic of three distinct codebases simultaneously, one of which was a public codebase from a published paper, while also refining my understanding of the project’s primary objectives and the available data. The data itself posed its own set of hurdles, as it consisted of real-world transcripts that naturally included a few errors and inconsistencies, along with diverse patterns. Many transcripts contained quirks in their formatting or content, requiring me to adapt the code to generalize across these diverse cases while still ensuring readability and efficiency.

These characteristics added to the complexity but also made the process of working with the data more engaging and rewarding, as it offered an opportunity to develop solutions that could handle the richness and variability of real-world scenarios effectively.

Fortunately, collaborating with the DLAM team proved invaluable. Their deep familiarity with the nuances of the automatic captions and the variety of lectures and videos being processed helped me quickly identify and address the peculiarities in the data. Their insights and expertise significantly accelerated the process of overcoming these challenges. Working with this kind of data also gave me the opportunity to develop solutions capable of effectively managing the complexity and diversity of real-world scenarios, an experience that I am confident will prove invaluable in the future.

What kept me motivated throughout the project was the tangible impact it could have on improving accessibility for students and enhancing the usability of educational resources. Knowing that the work had the potential to make a real-world difference for learners relying on captions gave me a strong sense of purpose.

Additionally, the process itself was incredibly inspiring. Every time I discovered a new technique or refined a method, it felt like I was pushing the boundaries of what could be achieved. Collaborating with brilliant individuals on the team, and learning from their perspectives and expertise, fuelled my enthusiasm and drive to keep progressing. Ultimately, the blend of personal growth, technical challenges, and the goal of creating meaningful solutions made the journey deeply rewarding.

Beyond collaborating with the main team, I had the pleasure of meeting the captioners and Tallulah Thompson an intern who is developing a system to identify empty or faulty lecture recordings. While I thoroughly enjoyed connecting with everyone, I found Tallulah particularly inspiring. She is working on a fascinating project with practical, real-world applications, all while pursuing her undergraduate degree. Her dedication and ingenuity motivated me to aim higher in my own work and reinforced my desire to create solutions that can have a tangible impact.

The work environment was fantastic, it felt both encouraging and engaging. Everyone showed genuine support for the project and the work I was doing. What stood out was their curiosity about the process and outcomes of my experiments, which made me feel valued and motivated. Nelly Iacobescu was particularly supportive, always available to answer my questions or lend a hand when needed. My manager Professor Peter Bell also made a huge difference, carving out time from his busy schedule to help with technical challenges and guide me through the work. Beyond that, I had the chance to connect with remarkable individuals from Information Services and the wider University community. Their involvement and enthusiasm made the experience not only productive but truly unforgettable.

One of the most exciting “eureka!” moments during the project came when I realized the significant impact of providing extended context to ChatGPT for improving caption corrections. By allowing the model to analyze not just individual sentences but also the ten preceding sentences, its ability to accurately correct specialized terminology improved dramatically. This discovery highlighted the importance of leveraging contextual information for enhancing performance in real-world applications.

Another breakthrough came while experimenting with “few-shot” prompting to further enhance the model’s corrections. This technique involves providing the model with examples of how human captioners correct automatic transcriptions, enabling it to emulate their correction style. Although I only had time for a few experiments, the results were promising, this approach effectively reduced the model’s tendency to alter sentence structures and minimized unnecessary edits, such as removing repetitions and filler words from captions.

This project significantly expanded my knowledge of both Automatic Speech Recognition (ASR) and Natural Language Processing (NLP). I gained deeper insights into how different prompting techniques and contextual information affect the performance of Large Language Models (LLMs). Additionally, I refined my programming skills by working on large, collaborative codebases, which involved understanding and building upon code written by others. I also learned to use new libraries designed for efficient LLM prompting and explored fine-tuning models on GPU clusters.

These technical skills will be incredibly valuable in my new role, where I’ll be contributing to extensive, team-driven codebases. Equally important are the research skills I honed during this project: designing experiments, conducting thorough testing, and analyzing outcomes. All foundational practices that are critical for developing robust and impactful products. Overall, this experience has left me better equipped to tackle future challenges and contribute effectively to any project.

I wish I had been able to complete the process of fine-tuning ChatGPT. While the model is already very effective at correcting automatic captions, especially specialized vocabulary errors, it occasionally makes stylistic changes that aren’t suitable for captions. For instance, the model performs best when provided with extended context, such as the ten sentences preceding the one it’s working on. However, this contextual understanding sometimes leads to alterations in sentence structure that aren’t desirable for maintaining the integrity of the original transcription.

Fine-tuning the model using examples of human corrections could help it learn when and how to make appropriate edits. This approach could produce a tool that not only excels at addressing specialized terminology but also preserves the original structure of the transcript. Such improvements would likely result in even better Word Error Rates. I’m left genuinely curious whether this method could have achieved the desired balance and performance.

I believe the work on this project has the potential to make a meaningful impact in several ways. By improving the accuracy and usability of automatically generated lecture captions, it enhances accessibility for students who rely on these captions to engage with educational content; whether due to hearing impairments, language barriers, or diverse learning needs. Currently the solution is not applicable without human supervision but even as it is it has the potential to speed up the manual correction of the captions making this process more efficient meaning that more accurate captions can be obtained faster.

Beyond accessibility, the insights gained from this project could inspire broader applications, such as improving automatic transcription tools for various professional and educational contexts. The refined techniques could contribute to advancing how AI integrates seamlessly into real-world settings to address practical challenges. Overall, I hope this work contributes to both the technical field and the user experience, creating opportunities for more inclusive and efficient learning environments.

Correcting Academic Language with AI / Digital Learning Applications and Media by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Posted by khowie

1st May 2025

Categories

thinking

Tags

No tags have been added to this post.

Previous post

How caption editing made me fall in love with Ophthalmology (and why I now want to do the Master’s I’ve been captioning for)

Next post

FLORA’s legacy

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Digital Learning Applications and Media

Toggle navigation menu Menu

Share

Leave a reply

Report this page