Same Image, Different Story: Why AI Needs the Right Architecture to Fix Accessibility
After Stratos’s blog post last week, I revisited Joshua Mitchell’s experiment asking an AI to simulate what using a screen reader actually feels like, not to replace proper testing, but to generate a transcript of the experience that could be shared with stakeholders who had never encountered one. The results were striking: skip links that went nowhere, navigation menus entirely unreachable by keyboard, and link text repeated identically ten times with no distinguishing context.
Stratos’s post asked whether AI is improving or impeding web accessibility, and ended with an open question to the community: are you already using AI in your accessibility workflows?
I’d just come back from DrupalCamp England, where I’d presented a talk called “Same Image, Different Story”, one I first gave at DrupalCamp Scotland, and will be taking to Drupal Dev Days later this year. It’s a talk I’m deliberately treating as a work in progress, updating it with new thinking and developments each time rather than delivering the same version twice.
And I think I might have part of an answer to Stratos’s question, though it’s more of a diagnosis than a solution, at least for now.
The problem hiding in plain sight
Harvard’s Digital Accessibility Services has a useful guide on writing alt text that includes a section called “Consider the Context.” It shows the same photograph of Hollis Hall used in two different articles. One is about students enjoying the spring weather, and the other is about the building’s famous residents, and it demonstrates that each use case demands entirely different alt text. Same image, different story.
It’s a compelling illustration of best practice and is the cornerstone of my talk. That single example, two articles, one image, two completely different appropriate descriptions, captures the problem more precisely than any technical explanation I could give. But it also quietly exposes an architectural gap: most content management systems, including Drupal, the platform that powers the University of Edinburgh’s EdWeb 2, don’t give editors anywhere to act on that guidance. The image gets a single alt text field. One description, stored once, is applied everywhere the image is used. An article about student life. A seasonal blog post. A facilities page. The same text, regardless of which detail is editorially significant in each context.
This isn’t a quirk of how one editor set things up. It’s a fundamental constraint of how Drupal’s media architecture works. And the consequences reach further than you might expect.
The anti-pattern that reveals the problem
When Drupal introduced the Media Library, it was framed as a shared asset pool designed to encourage image reuse and reduce duplication, and the intent was good. Upload once, use everywhere. But what we’ve observed in practice is editors quietly working around it: uploading the same image multiple times under different filenames, just so they can have different alt text, or set a different focal point, for different editorial contexts.
The platform designed to reduce duplication is inadvertently encouraging it. That’s a significant signal. When users consistently work around a feature, it usually means the feature doesn’t match how they actually need to work.
Where does AI fit into this?
Stratos’s post noted that AI-generated alt text is improving, but inconsistent, and the W3C’s own work on machine learning accessibility is honest about the gap. A bar chart described as simply “a graph with coloured bars” versus one that explains the data in full is the difference between access and exclusion.
There are already Drupal community contributions that tackle this, and they’re genuinely promising. AI may be able to offer a better first draft of alt text to editors to update manually, especially under time pressure.
But here’s the thing that my talk kept circling back to: even if AI could generate better alt text, Drupal has nowhere contextual to put it.
If the media architecture only supports one alt text value per image, then it doesn’t matter how good the AI generation is. The result still gets flattened to a single description, applied in every context, whether it’s appropriate or not. You haven’t solved the accessibility problem; you’ve just automated the production of the wrong answer, faster.
There’s a further constraint worth naming, too: current AI alt text tools work from the image alone. They don’t read the surrounding page content, the article headline, the body copy, or the editorial context, so they have no way of knowing whether the focus should be the students, the architecture, or the changing seasons. The next step in making this genuinely useful is finding ways to pass that subject matter to the AI, so it can generate alt text that’s not just accurate, but relevant to the specific editorial context it’s being placed in.
A practical step we could take today
There’s something worth drawing from Joshua Mitchell’s experiment here. He didn’t ask AI to fix the screen reader experience; he asked it to describe it, making an abstract problem visible and actionable.
We could apply the same thinking to alt text validation. Rather than waiting for the architecture to catch up, AI could be pointed at an existing page and asked to interrogate it: does this alt text accurately describe the image? Does it make sense in the context of this article? Is it serving the reader, or just technically present?
That’s a use case that’s achievable right now, without any changes to how Drupal stores media. And given that EdWeb serves over 600 subsites with around 1,500 editors, the ability to audit contextual appropriateness at scale, rather than relying on individual editors to self-assess, could make a meaningful difference.
The question I’m sitting with
The more interesting challenge, and the one I’m actively exploring at the University, isn’t “can AI write good alt text?” It’s “can we build an architecture where AI can write the right alt text for a specific editorial context, and can the CMS preserve and serve that appropriately?”
That feels like a genuinely solvable problem. The Drupal community is already moving in the right direction with contributions that allow editors to override media properties per-use rather than per-asset. Pair that with AI-assisted generation, editorial context passed as subject matter, and human review, and you start to have something that could meaningfully improve accessibility at scale, across a platform serving over 160 environments and more than 600 subsites, as EdWeb does.
This is part of a broader piece of work I’m developing at the University around AI editorial assistance, focused not on generating content, but on helping editors make better decisions: style guide compliance, accessibility checking, and contextual awareness. It’s early days, but the alt text problem feels like a good place to start.
To borrow Stratos’s framing: keeping the human in the middle means making sure the human has the right tools to make contextually appropriate decisions, not just faster ones. I’ll be updating this thinking as the talk evolves, and as the work here at Edinburgh develops. If anyone else in the Higher Education community is thinking about this, I’d love to compare notes.