Making AI useful and usable – consolidated learnings from UX research of Drupal AI Assistants

Agentic AI is a new area of Drupal development. Like all software developments, it can be made more useful and usable through UX research. Working with Drupal AI expert Jamie Abrahams, I conducted research with University staff to learn about the UX of AI Assistants so I could contribute to making them even better.

The University uses Drupal to power several important systems, including our current web publishing platform, EdWeb2 (and before that, EdWeb). As an open-source product, Drupal continues to advance and evolve led by a global community of contributors and agencies. I was welcomed into the Drupal community in 2022 and since then have enjoyed working with fellow contributors to share my expertise in UX and content design.

Most recently, I have supported the user-centred development of Drupal CMS – a new Drupal site-building and content management product aimed at a target audience of non-technical users, including content editors and marketing professionals – by contributing to the Drupal CMS UX track.

At DrupalCon Barcelona in September 2024, I met Jamie Abrahams, Drupal CMS AI track lead from specialist agency Freely Give, and I learned more about the Drupal AI Assistants. Jamie and I wanted to align the goals of our respective tracks and, in particular, ensure that the AI included in Drupal CMS offered an excellent user experience.

Read more about Drupal CMS (formerly called Drupal Starshot) in the product strategy on Drupal.org:

Introducing Drupal Starshot’s product strategy

Read more about Drupal AI developments on Drupal.org:

Drupal CMS AI Track

Read more about the UX work on Drupal CMS in my related blog post:

UX leading the newest developments in Drupal – a mindset shift for Drupal CMS

UX research – a key step towards making the AI Assistants excellent for the target audience

The development of Drupal CMS is being driven by a UX mindset, with new features being built and shaped to align with expectations and needs of its non-technical target audience. It was important to ensure that the AI development followed the same Build-Measure-Learn cycle, and for that reason it was necessary to conduct UX research of the AI with its representative users, so this knowledge could be fed into the continued build and development.

In November 2024, I worked with Jamie to design a series of tests to conduct UX research on the Drupal AI Assistants. We set out several goals for the testing:

To understand how people naturally expected to use the AI Assistants
To understand how people evaluated the success of the AI Assistants
To use the information gathered from testing to measure the success of the AI Assistants and to iterate and improve them.

I was keen to learn what University staff thought of the Assistants as I felt they could be a helpful addition to those using EdWeb2, so I recruited University staff as test participants, and carried out online seven tests in an iterative pattern over two weeks. I adopted a standard usability test format combined with contextual enquiry, in which I presented participants with a scenario and set of tasks to work through, allowing them to use the AI Assistant to help them in whatever way they wished. I designed the scenarios and tasks with the help of conversational design literature, and I analysed the results of the tests against a set of user-centred success indicators I had previously defined. I learned a huge amount, and have blogged separately about my work to design the tests, carry out a pilot and run two iterative testing rounds:

How do you learn about the UX of AI – Designing tests for Drupal AI Assistants

Early adventures in UX AI research: Pilot testing Drupal AI Assistants

Iterated testing for rapid, rich learning: Researching the UX of Drupal AI Assistants

In this post, I provide an overview of the research findings from the tests, including my observations on the ways the test participants interacted with the Assistants, how they evaluated them, my deductions on the factors that had the most significant impact on the UX provided by the AI Assistants and the success of the Assistants measured against my pre-defined UX success indicators.

I also include my reflections on conducting this research, my notes on the improvements we made following the iterated testing and my ideas to keep improving the UX provided by the Drupal AI Assistants.

How did people naturally expect to use the AI Assistants?

Having run all seven tests myself, I was well-placed to collate my observations and recognise patterns in the ways the participants initiated exchanges with the AI Assistants and responded to them.

All participants had a positive first reaction to the AI Assistants

In order to avoid pre-conceived bias, I had not described the AI Assistants to the participants before the tests began, therefore when they encountered the AI Assistants in the tests, this was the first time they had seen them. All of them reacted positively when being shown the AI Assistants for the first time and quickly worked out how they could interact with them as chatbots and use them to help them complete tasks.

As they continued to interact with the AI Assistants, participants built curiosity and trust

Once they got started in exchanges with the Assistants, participants started to learn what the Assistants could do and were naturally disposed to continue interacting with them. The clear, well-formatted instructions provided by the AI Assistants were well-received by the participants, this helped them build confidence and regard the Assistants as both useful and usable. Some participants sought additional clarification from the AI Assistants to help them decide on actions to take, which suggested they liked being guided by the AI Assistants and were building trust in using them as a help tool.

Their expectations were exceeded when they realised the AI Assistants could do tasks for them

When presented with tasks to complete, most participants expected that the AI Assistant would help them by providing instructions to co-pilot them through the work, with a minority expecting the AI Assistant to know the system and take on the work for them. Once participants learned that the AI could actually undertake actions to complete tasks, they were very impressed and tended to choose this option instead of following instructions from the Assistant, since it was much faster and saved them a lot of effort.

I thought a chatbot was a just a ‘help’ tool, I hadn’t thought of it as a ‘do’ tool – Participant 2

How did people evaluate the success of the AI Assistants?

All seven participants were introduced to the thumbs-up and thumbs-down evaluation mechanisms in the context of the output from the AI Assistant they were interacting with. When questioned about how they would use this functionality, most of them said they would use these at the end of an exchange, once a task had been completed (likely to prompt a thumbs-up) or they had decided to stop using the AI Assistant (likely to prompt a thumbs-down). In the context of specific interactions, the evaluations the participants gave varied. Some were inclined to ‘reward’ the Assistant with thumbs-up as they went, others opted only to give thumbs-down when the output hadn’t answered their question or hadn’t helped them in the way they had expected.

Factors that positively impacted the UX of the AI Assistants

Reflecting back over all seven tests, I noticed some recurring factors that affected the way individuals perceived and behaved towards the AI Assistants, and I recognised patterns in the way the exchanges played out between the users and the AI.

Assistants being able to support a range of prompts and questions

Despite participants in each testing round all being presented with the same scenario and tasks, there were variations in their approaches, and the AI needed to be able to respond to these, and continue to react and adapt to variations throughout the exchange in order for each participant to feel they were being understood by the AI on a personal level, and therefore supported through the stages to the conclusion of the task or interaction. For example, the Assistant needed to be able to provide the same offer of help whether participants asked direct questions like ‘How do I add images to book reviews?’ or indirect questions ‘I want to add book covers to my book reviews can you help me with that?’.

Assistants being clear about what they could do

Encountering the Assistants for the first time, participants needed to find out what they were capable of. A turning point for most participants was when they realised Assistants could undertake tasks for them, so the sooner they realised this in the interaction, the quicker they were able to make the most of this capability, which participants regarded as impressive.

Assistants checking with users, not assuming, and always presenting work to be checked

In a few cases in the tests, the AI Assistant offered help to the participant, but then completed the task on their behalf, or completed a task with a best-guess of what the participant wanted (for example with participant 4 and an author field). When this happened, it caused participants to question the AI Assistants. A better UX was provided when the Assistant checked details before doing the work and also provided evidence of what had been done (for example, links to the relevant sections of the interface) for the user to check it had met their expectations, therefore keeping a human in the loop as per responsible AI practice. This addressed the preference that some participants expressed to understand what had been done in the Drupal system so they would be better informed for using it in the future.

Provision of precise descriptions with terminology matching what was in the interface

When following instructions from the AI Assistants, participants entered a flow, referring to each stage in the chatbot output and doing the work in the interface in response. When the terms used by the Assistant didn’t exactly match what was in the interface, this was jarring for the participant, as it meant they had to try and make sense of unfamiliar words and phrases by themselves, having relied on the Assistant up until that point. To get out of this sort of situation most participants asked the Assistant for clarification, however, sometimes its explanations provided resulted in them having to make sense of more unfamiliar terms, which could have been avoided if terms had more precisely matched (for example, instead of the Assistant advising participants to look to add an ‘Image’ field, advise them to find this field within ‘Media’). The ideal response contained a balance of detail, not too little, but not too much.

Answers with subtle additional details – especially context-specific knowledge

When a participant asked an AI Assistant a question, they naturally assessed the quality of its answer based on the relevance to the specific context they were in. For example, when the pilot participant asked the Assistant how to add an image to a news content type and the Assistant instructed them to first check if the news content type was in the system, the participant felt a bit let down by this answer as they felt the Assistant hadn’t recognised the context. In contrast, in the second round of testing, when participant 5 asked a specific question about how to complete a field in the configuration settings, and it referred back to the context they were working in (relating to the fields of a book review content type) this was more helpful and therefore representative of a better user experience.

From the participants’ perspective, how successful were the AI Assistants?

Adopting a scientific approach to the seven tests, which contained:

A total of 15 tasks carried out (two by the one person in the pilot, six (two each) by the three participants in round one and a further seven (two each for participants 4 and 5 and three for participant 6) in round two
An approximated total of 110 interactions between participants and AI Assistants

I referred back to the user-centric success indicators I identified when I designed the tests, and calculated the measures as follows:

Participant has a positive response to being presented with the AI Assistant – 7 out of 7 participants
Participant can communicate with the AI Assistant using language they would naturally use – 7 out of 7 participants
AI Assistant responds to participants’ prompts with relevant answers (no hallucinations) – approximated 95 out of 110 interactions
Participant can understand, respond to or act upon the response from the AI Assistant – approximated 90 out of 110 interactions
Participant completes task with the help of the AI Assistant – 11 out of 15 tasks
Participant reflects on a positive experience and evaluates accordingly – 7 out of 7 participants

Overall, how good was the UX of the AI Assistants?

The experience provided by an AI Assistant in exchange with a participant was very different for each individual. For some participants, a good UX was the AI completing all the tasks for them, for others, a good UX was being guided through tasks by the AI so they learned how the system worked and were empowered to take on tasks confidently themselves.

Jared Spool, renowned UX expert, has described designing good user experiences as trying to achieve a balance between alleviation of frustrations and provision of delights. In the context of the Drupal AI Assistants, delightful elements of the AI UX manifested as the presentation of the offer to complete tasks, or the option to clarify technical terms in the administrative interface. On the other hand, elements of the AI UX which alleviated points of frustration included any actions they could take to simplify the use of Drupal CMS and make it easier to grasp and understand – for example the presentation of precise use of terminology to match what was present in the site users were working on and provision of instructions with just enough information to enable users to complete tasks without additional detail that could potentially confuse.

Read more from Jared Spool in his article:

Is ‘Delight’ the Best UX Design Intention?

My reflections on planning and conducting UX research on AI

I found it hugely enjoyable to design and plan the tests of the AI Assistants and to carry them out, and I learned much about how to approach and conduct this type of research going forwards.

The output from AI interactions is variable, so success needs to be measured in several different ways

The UX success indicators I identified in advance of testing were helpful to rate the UX that the AI Assistants provided, however, there was a lot more to assessing the UX of the AI Assistants, tied up in the nuances of individual experiences. In the tests I conducted, I found it helpful to both view and analyse the detail of each interaction separately, as well as to consider commonalities and differences in exchanges looking at the bigger picture of all seven user experiences. In addition to the variability of the individual interactions, as the AI itself was developing through iterative improvement it was important to include comparisons of earlier tests with later ones in a more holistic data analysis.

An iterated approach testing with a small number of participants allowed for quick learning

When setting up the tests, I was unsure how many participants to include at a time. I opted to start with a small number (one for the pilot) and then grow this steadily through the successive rounds. In hindsight, I think this was the right approach. The tests only required a small number of participants to learn a lot, and as we were able to respond to what we learned in the tests and improve the AI quickly, we were able to make best use of the participants we had to test as many changed and different aspects in successive testing rounds over the two-week period.

Content and interface of the test environment affected participants’ perceptions

Observing participants interacting with the AI Assistants to achieve their tasks, I realised the importance of preparing the test environment mindfully – in particular to include content in the interface as a way for them to make sense of what they were seeing. To ensure participants focused on the task in hand and their interactions with the AI Assistant, the interface they were using needed to reflect the context as clearly as possible. For example, it needed to include sample data and content (such as representative articles or news stories) and there was a need to ensure that the configuration was consistent and joined-up (for example with the content types represented in the ‘Structure’ menu matching the options available to choose in the ‘Create’ section). This was improved between testing rounds 1 and 2 when the scenario changed, but the fact that participants in the second round also said they were looking for indicators of what the site was about suggested that the more realistic the testing environment could be made, the better.

I enjoyed seeing the participants push the AI and start to have fun with it

Very often usability testing is about identifying pain points but one of the pleasures of this research was watching the participants experimenting with the AI Assistants, pushing it to see what it would achieve, and through the language they used to communicate with it, having some fun. I liked having the flexibility to change up the testing scenario mid-way to open up more of the technology for the participants to experiment with. Seeing a participant enter this prompt was one of my favourite parts of the tests.

Please proceed Drupal Agent! – prompt entered by Participant 6

How we used what we learned to iterate and improve the AI Assistants

The iterative testing plan provided opportunities to feed information learned from the tests back into continued development of the AI Assistants. Beginning with a pilot test and moving through two successive round of testing, several areas were improved over the course of the two-week testing period:

Front-end amendments were made to ensure the Assistant window did not wipe the chat history when the user navigated to work in the interface
Assistants were trained not to ask users to check existing functionality before acting
Responses from Assistants were changed to provide simpler, less technical information
Assistants’ responses were changed to ensure consistent formatting
Assistants were enabled to do tasks and make this option clearer to users
Broken links in the AI Assistants’ output were fixed
Assistant descriptions were altered in the back-end to ensure they offered the choice of tasks or instructions not both together at once

What next for improving the UX of the AI Assistants?

AI Agentic development is continuing apace, and prioritisation will be necessary to ensure AI-directed efforts have the most UX-related benefits. From this round of tests, I identified several next steps, some practical and short-term, others requiring longer-term commitment.

Keep ensuring the precision of the answers from AI Assistants

As participants continued to use the AI Assistants over the course of the tests it was clear that repeated increased use of the AI Assistants by the non-technical target audience of Drupal CMS would contribute to improving the quality of the answers provided by Assistants (to ensure these provided just the right amount of precise detail to enable and empower the non-developer Drupal user. Proceeding with the plan to open out testing to anyone online using a wider range of scenarios and tasks (as in the original testing plan) would potentially accelerate precision improvement.

Read more about the plans for testing in the meta issue on Drupal.org:

[Meta] Conduct evaluations of the AI Agents

To provide the ‘delight’ ensure the Assistants keep offering to do tasks

All participants were pleasantly surprised when they realised the Assistants could complete tasks for them, however, in some cases they only realised this functionality was available once they had completed tasks themselves. Based on the results of the testing, it would be worthwhile investing effort in continued training and iteration of the Assistants to ensure they consistently offered help up-front, as this was a welcome option. Along with the offer to do the work, Assistants should offer the option to check the work and also to question and clarify requirements of users before proceeding, to make sure the Assistant was serving the user and the user was kept in control.

Enable the AI Assistants to be moveable around the interface

After participants had asked the Assistants questions, they typically wanted to act on what they had learned by making amendments in the interface. To do this they needed to move the Assistant out of the way, and in the testing instance, it was only possible to collapse the Assistant and open it up on the right hand side of the screen. Some of them tried to drag the Assistant window out of the way which suggested that enabling the Assistant chat window to be draggable to anywhere on the interface would have potentially improved the user experience, by enabling the user tailor where they positioned the AI Assistant.

Make use of the AI Assistants to help people learn Drupal in real-time

None of the test participants were Drupal experts, yet by the end of the tests they had all managed to complete tasks in the Drupal administrative interface without having to go through lengthy training videos or read extensive documentation. I was struck by the power of the AI Assistants to offer a helping hand through tasks and help people learn Drupal in bite-sized chunks. Aligning with the vision of Drupal CMS for non-technical users, I feel this use of AI is an opportunity to be maximised to counter the tendency for Drupal to be regarded as ‘for developers only’.

Enlist AI Assistants to help users understand complex Drupal terminology

When following instructions from the Assistants to complete their tasks, sooner or later the participants encountered words that were unfamiliar to them. They sought help from the Assistant to understand what terms meant which, on the whole, helped them through the tasks, however, they finished the exchange feeling less confident in their abilities than before. Several commented that the language made them feel they were in the wrong place because it felt very technical. This finding reiterated the importance of my ongoing related work with a small group in the community striving to make Drupal’s language easier to understand, which I have blogged about previously:

De-jargoning Drupal – working with the community to open up Drupal’s terminology

I feel there could be an opportunity to align the ongoing improvement of Drupal AI using LLMs with the focus of the Drupalisms working group, towards the goal of a shared Drupal vocabulary which is clear and intuitive for all the community to use and refer to.

Continue doing UX research to inform use-cases for AI Assistants to help with

The potential of AI Assistants to help users with tasks in the Drupal interface has been proven by this relatively small amount of UX research. AI may be applied in many more Drupal contexts to help and empower users to work with this system. As the build of Drupal CMS continues, many more use cases for the application of AI are likely to emerge. Making use of the UX resources already defined, including the defined archetypes, Jobs To be Done for the target audience, and information about priority tasks and user journeys and will help guide the future direction of user-centred AI Assistant development in line with the Drupal CMS UX mindset.

Access Drupal CMS (formerly Starshot) UX resources on Drupal.org:

Drupal Starshot UX resources

We’re working with Freely Give to explore options to bring AI Assistants to EdWeb2

Based on the findings of the research and the results of the tests we established that it would be valuable to establish a working relationship with Jamie Abrahams and Freely Give to explore ways we can incorporate Drupal AI technology into EdWeb2 to help and empower web publishers to use the system to achieve their tasks, and improve their experiences of using it. To this end, we’re planning a workshop in January 2025 to begin this work and I am excited to continue learning about ways we can improve the UX of AI – to make it a viable, useful and usable option for our University staff.

Posted by Emma Horrell

23rd December 2024

Categories

AI • Content management • Drupal • Drupal AI • Drupal UX • EdWeb CMS • User Experience & usability • User research • UX Service

Tags

AI • CMS • Drupal • Drupal AI • Drupal UX • UX Service

Previous post