Experimenting with GenAI: why we won’t be using it to help with degree finder editing for now

Since the end of 2024, we’ve held team sessions to experiment with using GenAI to edit content. We’ve found it to be unreliable and more time-consuming than manually editing content, so we’re not going to use it to help with degree finder editing in the near future.

Trying to establish GenAI working practices for editing content

We started a series of AI experimentation sessions back in October with the goal of establishing common working practices for how (or if) we use GenAI for editing content.

We wanted to come up with some use cases for it and establish scenarios for when we shouldn’t use it.

We were especially interested in doing this because our team serves a devolved community of over 100 school-based editors, who update programme entries in our degree finder. Making these updates is labour-intensive all round, so we’re mindful that colleagues might be experimenting with GenAI to see if they can use it to reduce their workload.

We want to do our own experiments and share our learnings to help others to avoid learning things we’ve already been through.

For these sessions, we primarily used the University’s own GenAI tool, Edinburgh (access to) Language Models (ELM). ELM is essentially a version of ChatGPT, but any data we feed it isn’t retained by OpenAI.

ELM: learn more on the Information Service site

Our collaborative GenAI experiments

Session 1: Editing degree finder content with ELM

What we did

During our first session, we explored how we could edit degree finder content using ELM. Each year, our Content Operations team edits nearly 900 programme entries as part of the annual update of our degree finder.

School-based editors liaise with academics to make sure degree programme content is up to date, and we then do a final edit to make sure the content follows our style guide and digital content best practice.

It’s a massive task to edit that many programmes, so we wanted to see if ELM could help speed up the process.

We uploaded a PDF version of the University style guide into ELM, and developed a prompt we could all use and adapt for this task:

I want you to act as an expert content designer who designs and edits website text for the University of Edinburgh. You should be knowledgeable about the University of Edinburgh style guide. You should follow generally accepted best practice in terms of writing content for websites and digital interfaces. I will provide you with text which needs copyediting to meet the requirements for how people read on screen. The content you supply needs to improve the readability, accessibility and usability for the reader.

For the text to edit, we used an overview for a programme that was written in academic language and not broken into subheadings. I threw in a few typos and errors to catch, too.

What we learned: ELM ignored our style guide

We went too big with our first experiment with ELM. The text (380 words) was too large for us to easily assess how ELM was editing the text. But what we did notice was:

ELM wasn’t obeying our style guide, even though we uploaded a PDF of it. Only if we explicitly mentioned certain style elements to follow did it do it.
ELM weirdly interpreted directions. I wrote a prompt about writing for a 12-year-old reading level, and it instead changed the content to say the programme was for kids.
Dropping the temperature of ELM toned down the amount of marketing language it added in. (In AI, a lower temperature means less randomness in the output.)

We came away from session 1 realising we needed to try a smaller piece of text for our next session.

Session 2: Getting ELM to interrogate and obey the style guide

What we did

At session 2, we tried to get ELM to interrogate and obey the style guide.

Rather than feeding through a big piece of text, we focused on getting it to tell us what the style guide said about certain things and to edit a short piece of content to match it.

What we learned: ELM could use our style guide, but not all of it

We had more success with this session, but still ran into issues.

On the positive side, we could get ELM to successfully say what our style guide said about certain elements. For example, we asked it to ‘tell me what the style guide says about capitalisation’, and it gave back a correct response.

ELM also correctly edited short sentences or paragraphs with errors in it to match the style guide.

On the negative side, we learned that ELM couldn’t access external links. This was an issue because parts of our PDF style guide refer out to our website for guidance. So ELM could only read what was in the PDF.

Moreover, we also tried experimenting with the longer piece of text after seeing more success with the shorter text. This is where issues arose again.

Rather than ask ELM to edit the text, I asked it to find me the errors in it according to our style guide. Its response included hallucinations, telling me something was capitalised when it wasn’t.

When I asked why it gave me incorrect advice, ELM told me it was sorry and that it misinterpreted the text.

We were realising that ELM wasn’t a reliable editor. Plus, the inability to read external links was a major drawback.

Our colleagues in Website and Communications reached similar conclusions.

Testing ELM’s ability to return useful results with prompts about the Editorial Style Guide (Website and Communications blog)

Session 3: ELM versus ChatGPT for editing degree finder content

What we did

Feeling limited by ELM’s inability to read external links, we wanted to try out ChatGPT to do the same things we did in session 1 and 2.

What we learned: ChatGPT is better than ELM, but still unreliable

Despite being based on ChatGPT, ELM is far behind ChatGPT in features. We were impressed by how much better it could edit content and also explain its reasonings unprompted.

For each task, we learned:

Interrogating the style guide: It could correctly tell us what was in our style guide and could read the external links in it.
Editing short content: It could correctly edit content to match our style guide, and also, interestingly, fixed factual errors we inserted into the content. On the downside, it took liberties to add or change up the content when we didn’t ask it to (like making it more ‘inspirational’ sounding).
Editing long content: It added in subheadings that matched the guidance we gave it in the prompt, but also made a few style guide errors.

Overall, we came away feeling that ChatGPT handles editing content better than ELM, but still felt unreliable – not only because there were inaccuracies, but because of the lack of consistency in responses, even with the same prompt. (ChatGPT didn’t have a temperature setting like ELM, but I’ve since learned I could just ask it to lower the temperature in a prompt.)

Our conclusion: GenAI won’t make degree finder editing more efficient at present

Collectively, we’ve put in about a full week’s worth of content design resource towards experimenting with AI. Through this, we’ve reached the conclusion that using GenAI tools won’t make our degree finder editing process more efficient at present.

If it can’t reliably follow our style guide and takes too many liberties with the text, we’re just manually editing content we already needed to edit (but with the added risk of new errors introduced into the text).

Yes, maybe we could do more experimenting with lowering the temperature in ChatGPT to see if that produces more consistent responses. Yes, maybe we could do more tweaking to our prompts to yield better responses.

Regardless, we’ve been unable to prevent GenAI from hallucinating, and that’s ultimately the biggest barrier to making us feel comfortable about using GenAI to edit at scale right now. It won’t save us time if we still need to thoroughly check through outputs each time.

Where GenAI could be helpful: smaller scale tasks

While we’re not looking to do further experiments to use GenAI to edit at scale, that doesn’t mean we won’t find other use cases for it when it comes to editing.

Since we saw greater success with it editing shorter text, I think we’re best sticking to smaller scale tasks, like seeing if it could help reword a tricky sentence or provide advice for how to approach an edit.

That said, I don’t think the output it would provide even for a small task would be ideal, but it might trigger an idea of what to do.

I think the biggest lesson coming out of these experiments is that GenAI isn’t going to help perform the day-to-day work of our team. It’s something to maybe turn to when we’re stuck and need some feedback or inspiration.

I say ‘maybe’ because I don’t think the sustainability concerns around GenAI can be stated enough. With how environmentally costly each chat is, I’m in favour of asking teammates for feedback on something before engaging with GenAI.

Next steps: return to experimenting later

For now, our team is focused on delivering our new postgraduate degree finder and study site, launching in October.

After then, we’ll likely return to experimenting with GenAI again to see how things have changed. We’ll keep an eye on what other content designers are doing in this space in the meantime.

Other ways we’ve been experimenting with GenAI

GenAI versus Excel

While I covered three experimentation sessions here, we actually had a fourth, comparing how ELM performed against Excel for an annual editorial task we need to do.

Experiments in GenAI: Excel vs ELM for constructing and merging content (Jen’s post)

How our audiences are using GenAI

Flo, Content Designer in the team, has been exploring another angle of GenAI: how our audiences may be using it to find information about us and university study in general.

Is AI changing the way prospective students prepare to apply for University? (Flo’s post)

How are you using GenAI?

I’ve shared how our team have been experimenting with GenAI. What about you? If you work with prospective student content, I’d be interested to know how you’re using GenAI to help with editorial tasks and how you’ve found it. Feel free to email me or comment on this post.

Email Lauren Tormey

Experimenting with GenAI: why we won’t be using it to help with degree finder editing for now / Future student online experiences by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Posted by Lauren Tormey

9th May 2025

Categories

Content strategy and design • Degree finder service

Tags

AI • ChatGPT • GenAI

Previous post

Experiments in GenAI: Excel vs ELM for constructing and merging content

Next post

Web content improvements in a team of one – Experiences of a solo project

Future student online experiences

Experimenting with GenAI: why we won’t be using it to help with degree finder editing for now

Trying to establish GenAI working practices for editing content

Our collaborative GenAI experiments

Session 1: Editing degree finder content with ELM

What we did

What we learned: ELM ignored our style guide

Session 2: Getting ELM to interrogate and obey the style guide

What we did

What we learned: ELM could use our style guide, but not all of it

Session 3: ELM versus ChatGPT for editing degree finder content

What we did

What we learned: ChatGPT is better than ELM, but still unreliable

Our conclusion: GenAI won’t make degree finder editing more efficient at present

Where GenAI could be helpful: smaller scale tasks

Next steps: return to experimenting later

Other ways we’ve been experimenting with GenAI

GenAI versus Excel

How our audiences are using GenAI

How are you using GenAI?

Leave a reply

Report this page