Iterated testing for rapid, rich learning: Researching the UX of Drupal AI Assistants

Following a successful pilot test, I conducted more UX research of Drupal AI Assistants with University staff. I learned what staff thought and expected from the Assistants and how they interacted with them. Working iteratively, I was able to feed back what I learned between testing rounds to drive continued user-centred improvements.

Drupal AI is rapidly evolving, especially in the area of AI Agents and Assistants – intended to empower, support, and guide users to complete tasks. To keep their development on a user-centred track, testing with representative users has been essential, to ensure that users’ perceptions and expectations of the AI Assistants continues to inform their features, functionality and operating mechanisms.

I read about conversational interface design to help me design a test scenario and tasks which I’ve written about separately:

How do you learn about the UX of AI? – designing tests for Drupal AI Assistants

Running a pilot test helped me prepare for these testing rounds, which I’ve also blogged about:

Early adventures in UX AI research: Pilot testing Drupal AI Assistants

This blog post builds on what we learned from the pilot test, and details the learnings and iterations from the two rounds of testing that followed the pilot.

I’ve pulled together conclusions and next steps in a separate post:

Making AI useful and usable – consolidated learnings from UX research of Drupal AI Assistants

Starting the round 1 tests, I used the same scenario and tasks as the pilot

Outputs from the pilot test gave no cause to suggest that the scenario and tasks needed to be changed, and since some iterations were made following the learnings from the pilot, it made sense to keep the scenario and tasks the same, to see if the alterations had made a difference to the UX provided by the AI Assistants.

Scenario: You run a community group and you’ve taken the first steps to set up a website to keep people informed about what you do.

Task 1: You can publish textual news content on your site but you now need to be able to add images to the news items you publish. How would you use the AI Assistant to make sure you can add images to future news articles you publish?

[Expected result: image field added to news content type]

Task 2: In addition to the news articles, you want to be able to publish longer-form pieces providing information about collaborative community projects you have worked on, to include things like logos and links to the partners you have worked with. How would you use the AI Assistants to help you do this?

[Expected result: new content type created for long-form articles]

I kept the same UX success indicators as I had used in the pilot

In the pilot, I had used six factors to help me gauge what a successful user and Assistant interaction looked like. These were helpful in enabling me to do a quick analysis of the data so I used them again in the next two testing rounds.

Participant has a positive response to being presented with the AI Assistant
Participant can communicate with the AI Assistant using language they would naturally use
AI Assistant responds to participants’ prompts with relevant answers (no hallucinations)
Participant can understand, respond to or act upon the response from the AI Assistant
Participant completes task with the help of the AI Assistant
Participant reflects on a positive experience and evaluates accordingly.

Round 1 tests: What participants did and how the AI Assistant responded

There was increased potential to learn in the round 1 tests as I tested with three participants instead of just one person as in the pilot.

Task 1: Add in the functionality to enable adding images to news items – actions and responses

All three participants started this task differently. Participant 1 asked the Assistant how to create a news item, participant 2 asked how to add pictures to news items, and participant 3 asked if there was functionality to add news items in the site. The Assistant responded appropriately to all three prompts, for participant 1 advising checking if there was an existing news content type and then providing instructions to create one if there wasn’t one, for participant 2 advising adding an image field to a news content type and for participant 3 replying that the site did have a news content type to enable publishing news.

Participant 1 navigated to the interface and referred back to the instructions. They were able to follow the instructions (although the instructions lost their numbered formatting when the participant clicked into the interface) to check if a news content type existed, and affirming that it didn’t, were able to follow the Assistant’s instructions to create one. They got stuck completing the fields in the Settings section, (since the Assistant had only included ‘Configure the Field Settings’ in its instructions) and therefore called on the Assistant for help to know what to put in some of the boxes (asking the Assistant for example ‘what is a reference method?’ and ‘what does ‘create referenced entities if they don’t already exist’ mean?’). They commented that the terminology was more complex than they were used to but ultimately were able to complete the task helped by the information provided by the Assistant.

Participant 2 wasn’t familiar with the term ‘content type’ so asked the Assistant what this was, and asked a follow-up question of how to add an image field to go alongside a text field. Using the response they looked at the content types in the interface and opening up the basic page and noticing the CKEditor containing the option to add images, they concluded that this could be used for publishing news items with images in them, and felt they had achieved the task.

In response to the Assistant’s information that the site did contain a news content type, Participant 3 asked the Assistant how to access it. The Assistant provided instructions to go to the ‘Create’ section (but this did not show ‘news’ in the options). The participant advised the Assistant they could not see a news content type and asked if they should use a basic page instead. The Assistant advised this was possible, but pointed out the standard use case for basic page (for static content), and advocated the benefits of using the news content type instead, closing with an offer to help the participant create it. The participant replied to accept the offer of help, and the Assistant returned a step-by-step guide. The participant responded that they felt this looked complicated and asked if the Assistant could do it for them. The Assistant replied to say it was unable to directly modify content types in the system. The participant began to follow the instructions and in doing so, noticed that there was already a news content type in the ‘Structure’ menu – and was therefore confused that they hadn’t seen news in the ‘Create’ menu. They fed this back to the Assistant which suggested checking permissions. The participant then asked how to find permissions and the Assistant advised the ‘People’ tab. Although the interface did contain permissions settings functionality it did not specifically contain a ‘People’ tab, and therefore the participant decided to give up on the task.

Task 2: Add in the functionality to publish long-form articles – actions and responses

Both participant 2 and 3 approached this task in a similar way – asking the Assistant if the site already had a page type for longer articles. In the case of participant 2, the Assistant advised that Drupal did not have a built-in one but offered could help make one. The participant accepted this offer and received a response from the Assistant that this had been done and could be viewed in ‘Structure’. They were pleased with this result. In response to their initial prompt, participant 3 was advised by the Assistant that there was already a content type in the site that could be used for this purpose, and the Assistant offered the option to help them customise it. The participant checked and saw the relevant content type in ‘Structure’ (that the Assistant had created unprompted).To affirm it was working, again they looked for the relevant content option in the ‘Create’ section and seeing that it was not there, reported this to the Assistant. As in task 1, the Assistant responded with instructions to check permissions to ensure the content type was enabled, and having got stuck on this before in the earlier task, the participant decided not to proceed.

Participant 1 adopted a different approach to check if an existing content type would suit the publication of long-form articles. They asked the Assistant about the maximum character count permissible in the existing news content type, and assessing the answer, deduced that they did not need to create a separate content type.

Findings from the round 1 tests : Positive UX aspects

There were several successes to be noted from the three participants’ individual experiences of interacting with the AI Assistant.

Participants felt the Assistant had helped them

All three participants felt that the Assistant had been useful to a certain degree. Even participant 3 who did not achieve task completion commented that the quality of the answers and the information provided by the Assistant had been better than expected.

Participants were impressed that the Assistant could do tasks for them

Two out of three participants asked the Assistant to complete tasks for them, and it achieved task completion for one of these participants. They were surprised by this, and very impressed and felt they would have used this functionality before if they had known what the Assistant was capable of.

I thought a chatbot was a just a ‘help’ tool, I hadn’t thought of it as a ‘do’ tool – Participant 2

The Assistant was able to support different approaches to tasks

Despite all three participants approaching task 1 differently, the Assistant was able to interpret their prompts to provide information to support them making progress towards task completion using the flexible approaches within Drupal. For example, guiding participant 2 to recognise they could use the CKEditor image functionality instead of adding a specific image field, and helping participant 1 deduce that the news content type would support publication of long-form articles.

By continuing to ask the Assistant, participants were able to learn as they progressed through the task

Participant 1 perceived the numbered instructions from the Assistant as easy-to-follow and they relied on them, checking back and forth to the Assistant output while they worked in the administrative interface. Where they felt the need for more guidance (for example when the instructions lacked precision (e.g. ‘Configure Field Settings’ instead of instructions on how to complete each field), the participant opted to question the Assistant about specific concepts. The Assistant’s responses helped them learn about different Drupal concepts in context and therefore helped them gain a better understanding of Drupal, which they appreciated.

Adaptations after the round 1 tests: Areas tweaked and iterated for the next tests

The results of the tests revealed several things to be changed before we tested again.

Ensuring the formatting on the Assistant’s instructions wasn’t lost

Participant 1 wanted to proceed through the instructions provided by the Assistant, yet when they navigated away from the Assistant into the administrative interface, the formatting of the numbered list of instructions was lost, making it more difficult for them to follow. This could be changed in the front-end ahead of the next round of testing.

Screenshot showing a left-hand screen with the AI Assistant formatting correct and right-hand screen with the formatting lost

On the left-hand screen the Assistant’s instructions are presented in a numbered list but when the participant clicked into the interface the formatting was lost, as shown in the right-hand screen

Making sure the Assistant could do tasks and offering this up-front

When participant 3 asked the Assistant for help with creating a news content type it provided a step-by-step guide instead of offering to do it for them, and similarly it wasn’t until they approached the second task that participant 2 realised the Assistant could do tasks for them as well as providing them with instructions. Participant 2 felt that if this information had been provided up-front (perhaps in the Assistant’s opening ‘how can I help you?’ sentence) they would have been inclined to use this functionality sooner than they did to reap the benefits. To alter the way the Assistant informed users of the help it could provide, the back-end AI Assistant description was tweaked.

The Assistant description was updated to include this phrase:

“You REALLY want to try and do things for the end-user as much as possible directly and so if they ask for information about how to achieve things in their sites please always ask them if they would like you to just do it for them”

Ensuring the Assistant offered to do the work before it provided instructions

In its responses to participants in the first task, the Assistant provided sets of instructions ahead of the offer to complete the task itself. This meant that participants read and started to follow the instructions first, rather than realising the Assistant could complete tasks for them. To ensure that this option was clearer, the AI Assistant description was tweaked in the back-end.

The Assistant description was updated to include this phrase:

“When they [the end-users] present a question, try and think about how you would solve it for them and offer either to solve it for them or offer to tell them how to do it themselves. Please DO NOT just tell them how to do it, ask them if they want step-by-step instructions first”

Ensuring the terminology used by the Assistant matched what was in the interface

When participant 1 followed the instructions to add an image field to the news content type, there were several instances where the instructions did not precisely match the terminology or components in the interface – for example, the Assistant advising to ‘Add a field’ when the button was labelled ‘Create new field’, the Assistant referring to a drop-down when the options were displayed as tiles, and the Assistant directing the selection of an Image field when it was labelled ‘Media’.

Screenshots to show areas of mismatch in the AI Assistant's instructions and the labels within the interface

The left-hand screen shows a mismatch between ‘Add field’ (in the Assistant’s instructions) and ‘Create a new field’ in the Drupal interface. On the right-hand side, a mismatch between ‘Choose Image’ (in the instructions) and ‘Media’ in the interface

Ensuring the terminology was as clear as possible

To address participants’ comments that the terminology used was more complex than what they were used to, the AI Assistant description was tweaked in the back-end.

The Assistant description was updated to include this phrase:

“The people who are asking you to help them will potentially be content editors, marketeers, graphic designers, digital designers, front-end designers. They will prefer plain language rather than detailed technical information”

Ensuring the content in the interface represented the scenario

Participant 3 was confused when they could see a news content type in the ‘Structure’ section of the administrative interface, but could not see the option to pick a news item in the ‘Create’ menu. This highlighted the need to revisit the scenario in which the tasks were presented and to include representative content in the interface to support this, so help ensure the participants could effectively make use of the interface to make sense of the tasks they were being helped to complete by the Assistant.

Screenshots to show disparities in the interface between news being listed as a content type but no option to select new item in the Create menu

Screenshot to show a mismatch in the interface – with news in the list of content types (right-hand screen) but news unavailable in the ‘Create’ menu (on the left-hand screen)

Round 2 tests: What we tested – new scenario and tasks

To maximise the learning opportunity, we wanted to investigate how the Assistant could handle slightly more complex functionality requests, for example offering suggestions for facilitating connected content, often achieved in Drupal with taxonomies and tagging. To achieve this, it made sense to build on the existing set of tasks to add in extra complexity and design tasks with a looser remit than previously, to enable the participants to be more creative in the way they approached the tasks, and to push the functionality of the Assistant further if they wanted to. This called for a slightly more detailed scenario, while ensuring it was still straightforward for participants to understand and credible for them to adopt.

Recognising that these tasks could be achieved in multiple ways, no expected end-result was defined (as it was in round 1).

Scenario 2: You work for a publishing house and you’ve taken the first steps to set up a website to keep people informed about the work you do.

Task 1: So far you can publish textual book reviews on your site but you want to be able to incorporate visuals of the associated book covers on your site. How would you use the AI Assistant to help you do this?

Task 2: You know that the book authors you work with are a selling point for your books. Each author has their own personality and you want to bring this to life on your site, so people appreciate the authors behind the publications. You’re not sure of the best way to do this – so how would you get the AI Assistant to help you?

Task 3 (optional): You want to be able to label the book reviews on your site with the genre so you can group the reviews by categories and subject areas. How would you ask the AI Assistant to help you do this?

Follow-up task question: How successful would you say the AI Assistant was in helping you achieve each of theses tasks?

Round 2 tests: What participants did and how the AI Assistant responded

Three participants took part in the second round of testing, and I was interested to see if the less-tightly defined tasks would encourage variation in the ways participants interacted with the AI Assistants.

Task 1: – Incorporate visuals of book covers on reviews

Although they all started in much the same way, by asking the Assistant about adding in images to book reviews, the exchanges between the three participants and the AI Assistant played out slightly differently. In response to their initial prompt, the Assistant gave participant 4 two options, to have the work done for them, or to receive step-by-step instructions to enable them to self-serve. When the participant asked for both, the Assistant replied they could do one or the other, reiterating the offer to do it for them, to which the participant opted to have the instructions to do it themselves. By contrast, in response to their initial prompt, participant 5 received a list of instructions billed as a ‘simple guide’ with the offer of assistance afterwards. When they asked the Assistant how easy the task was to do, it still did not offer to do the task for them, instead it presented them with a shorter list of steps. Participant 6 initiated the interaction by telling the Assistant they wanted to add book cover visuals to book reviews and asking if it could help them. It provided a summary of what needed to happen, and advised it could help. The participant accepted the help and instructed the Assistant to go ahead, so it completed the task and provided links to the relevant sections of the interface for its work to be checked. Participant 6 was delighted with the outcome and gave the Assistant a thumbs-up evaluation.

Participants 4 and 5, with instructions to follow, began working through. Both noticed mismatches between the terminology used in the Assistant’s instructions and the terms in the Drupal interface which slowed down their progress of working through the stages. Both also had their progress slowed when the instructions about configuring the content type settings lacked sufficient detail to help them decide on action to take. This section of the interface included many empty boxes to be completed, and decisions to be made about toggling different aspects of functionality on or off, and participants weren’t sure what the consequences of their choices would mean. Both sought clarity from the Assistant, by asking questions like ‘What does create referenced entities if they don’t already exist mean?’ and ‘What does token mean?’. In some cases the answers provided by the Assistant were too technical for them to understand, however, overall, these answers helped the participants, for example when participant 5 asked about the ‘allow number of values’ box and the Assistant gave a context-specific reply that this related to the number of images they could add in to the review.

Task 2: – Show data about authors in the reviews in some way

The participants approached this differently. Participant 4, now familiar with fields asked how to add an author field to the book review content type. Participant 5 started by asking how to publish separate biographies of authors and later asked about how to connect this content with what was in the book reviews, and participant 6 explained to the Assistant that they would like to be able to add in author names and biographical information fields to the book reviews and asked for the Assistant’s help to achieve this. In all three cases, the Assistant offered to do the work for them or to provide instructions, and all three opted to have the work done by the Assistant. In the case of participant 4, in reporting the actions it had taken, the Assistant advised that an ‘Author’ text field had been created in the book review content type with a default input size of 60 characters, and giving links to check the work. When they checked the work, however, the participant found that the maximum character count on the field was actually 255 characters. For participants 5 and 6, the Assistant created a separate ‘Author Biography’ content type and then set up entity reference fields on the ‘Book Reviews’ content type to be able to reference relevant authors, providing links to the interface for the work to be checked. Upon checking, participant 5 then wanted to preview the front-end, and the Assistant accurately reported it was unable to access the necessary areas of the site to provide this. Participant 6 also wanted to get a sense of how the functionality would work and asked it to pull together a biography and list of publications for a named author (Arthur Conan Doyle) to see what it could look like. The Assistant did this and participant 6 was very pleased with the result.

Task 3: – Label book reviews with genres

Only participant 6 progressed to the third task. They told the Assistant they would like to be able to group the book reviews and associated content by genre (giving the examples of romance, young-people and action). The Assistant replied to advise setting up a ‘Genre’ taxonomy which could then be attached in the book reviews, ending with a ‘Let me do that’ message. The participant thanked the Assistant and waited for the work to be done, checking the work at the links provided when it was complete. Happy with this result, they gave the Assistant a thumbs-up.

Findings from the round 2 tests : Positive aspects of the user experience

As for round 1, there were several successes to be noted from the second round of tests.

The Assistant presented instructions clearly with no loss of formatting

With the formatting glitch from the round 1 tests identified and fixed, participants commented that they liked the step-by-step instructions, and particularly appreciated the names of key terms (for example, areas of the interface – e.g. ‘Structure’, ‘content type’) being presented in bold type to help guide them through the stages of instructions.

The choice of doing or helping was offered up-front more times and the Assistant persisted in offering to do tasks

The Assistant gave participants an up-front choice of the two types of help it could offer more times than it had done previously. Depending on how the participants responded, the Assistant then either completed the task for them or provided the instructions. When it provided instructions, it reiterated the offer to do tasks on their behalf. In the case of participant 4, who asked for both the instructions and for them to do it for them, it replied it could only do one or the other, ending with the offer to complete the work.

The Assistant completed tasks for participants more times than previously which participants hadn’t expected but welcomed

In the first task, one participant opted for the Assistant to do the work, while the other two followed the instructions it had provided. For the second task, however, all three participants accepted the Assistant’s offer to complete the work for them and were pleased with the results.

I like that the chatbot can actually make changes to the system, I wasn’t expecting that – Participant 4

All the tasks were completed – more than in the previous round of tests

In this round of testing, all three participants were able to see the tasks through to completion, either by following the Assistant’s instructions, or by asking it to do the work for them.

The more loosely-defined tasks encouraged the participants to push the Assistant’s functionality

All three participants approached task 2 differently, but the Assistant was able to cope with what they asked for, delivering different ways of representing author content on the site through the book reviews on the site. In response to the Assistant achieving this, several participants were keen for it to do more, for example, show them a preview of what the shared author content looked like on the front-end and create representative data about a named author.

Participants became more acquainted with the Assistant as they went through the tests

Since the scenario allowed for more flexibility in addressing the tasks, the participants were freer to communicate with the Assistant as they wished. As they worked through, it was clear to see them getting used to conversing with the Assistant in their natural language and having some fun with it, as this participant prompt demonstrates:

Please proceed, Drupal Agent! – prompt from Participant 6

Areas for improvement identified from the round 2 tests

The Assistant performed better than it had done in the previous round of testing. The results of the tests also indicated a few additional areas where it could be made even better.

Being able to move the Assistant around to uncover the interface

When they had opted to follow instructions from the participant in order to complete tasks, participants needed to alternate between reading the instructions displayed in the chatbot interface and working in the Drupal administrative interface and in some instances the chatbot display masked the underlying interface. Participants tried to drag the chatbot to reposition it to solve this issue but were unable to move it.

Screenshot showing the AI Assistant masking part of the administrative interface where the participant wanted to work

Ensuring the Assistant checked details with the user before doing

When participant 4 asked the Assistant to help incorporate author information to book reviews, it completed the task without checking precise detail, and erroneously advised setting a character count limit on a field of 60 characters, when in fact it was 255 characters. This made the participant question the accuracy of its work.

Reducing inconsistencies in wording and instructions

Even though the meaning was the same on the whole, mismatches between the terms provided by the Assistant in its instructions and the labels in the user interface (for example the instructions saying ‘Add field’ and the interface button labelled ‘Create a new field’ as illustrated above from the round 1 tests) still negatively impacted participants’ experiences as it interrupted their flow when completing tasks. Also in one instance (in participant 5’s experience) the Assistant provided a shorter, second set of instructions that was different to the first meaning the participant had to choose which instructions to follow.

Assistant advising what they could do in the welcoming prompt

Despite the Assistant reiterating that it could do tasks for them, most participants only realised this once they had used the Assistant in a ‘help’ capacity – following the instructions it provided to do tasks themselves. Several participants suggested the AI Assistants should state what they can do in the chat interface, perhaps as an extension to the welcoming ‘How can I help’ message – for example saying ‘I can do tasks for you or I can give you instructions to do it yourself’.

Avoiding insufficient detail in instructions yet not reverting to being too technical

The Assistant’s instructions were presented in plainer language than in the first round of testing, indicating that the tweaks to the back-end Assistant description had taken effect. The clear presentation of the instructions instilled trust in participants, however, this meant that when the instructions were incomplete, or lacking in sufficient detail, participants felt a bit lost. They were able to work through the missing information either by asking the Assistant for more help or by deciding what to do in the user interface themselves, however, the need for these additional prompts could potentially be avoided if the instructions had been more precise, however, that would also have meant they were more technical, so a balance needed to be struck.

What’s next: More testing and iteration

The two rounds of testing, following on from the pilot test, illustrated the benefit of repeated testing, not only for continued learning about user expectations and requirements, but also to keep stretching the capabilities of the Assistant, to work out areas of strength and weakness to be worked on and improved. More testing is likely to be planned for 2025, and in addition, with a view to beginning a third phase of testing – to be completed online by a larger and more diverse pool of participants.

Read more about the plans for testing in the meta issue on Drupal.org:

[Meta] Conduct evaluations of the AI Agents

Posted by Emma Horrell

23rd December 2024

Categories

AI • Content management • Drupal • Drupal AI • Drupal UX • EdWeb CMS • User Experience & usability • User research • UX Service

Tags

AI • CMS • Drupal • Drupal AI • Drupal UX • UX Service

Previous post

Early adventures in UX AI research: Pilot testing Drupal AI Assistants

Next post

Making AI useful and usable – consolidated learnings from UX research of Drupal AI Assistants

Website and Communications Blog

Website and Communications Blog

Iterated testing for rapid, rich learning: Researching the UX of Drupal AI Assistants

Starting the round 1 tests, I used the same scenario and tasks as the pilot

I kept the same UX success indicators as I had used in the pilot

Round 1 tests: What participants did and how the AI Assistant responded

Task 1: Add in the functionality to enable adding images to news items – actions and responses

Task 2: Add in the functionality to publish long-form articles – actions and responses

Findings from the round 1 tests : Positive UX aspects

Participants felt the Assistant had helped them

Participants were impressed that the Assistant could do tasks for them

The Assistant was able to support different approaches to tasks

By continuing to ask the Assistant, participants were able to learn as they progressed through the task

Adaptations after the round 1 tests: Areas tweaked and iterated for the next tests

Ensuring the formatting on the Assistant’s instructions wasn’t lost

Making sure the Assistant could do tasks and offering this up-front

The Assistant description was updated to include this phrase:

Ensuring the Assistant offered to do the work before it provided instructions

The Assistant description was updated to include this phrase:

Ensuring the terminology used by the Assistant matched what was in the interface

Ensuring the terminology was as clear as possible

The Assistant description was updated to include this phrase:

Ensuring the content in the interface represented the scenario

Round 2 tests: What we tested – new scenario and tasks

Round 2 tests: What participants did and how the AI Assistant responded

Task 1: – Incorporate visuals of book covers on reviews

Task 2: – Show data about authors in the reviews in some way

Task 3: – Label book reviews with genres

Findings from the round 2 tests : Positive aspects of the user experience

The Assistant presented instructions clearly with no loss of formatting

The choice of doing or helping was offered up-front more times and the Assistant persisted in offering to do tasks

The Assistant completed tasks for participants more times than previously which participants hadn’t expected but welcomed

All the tasks were completed – more than in the previous round of tests

The more loosely-defined tasks encouraged the participants to push the Assistant’s functionality

Participants became more acquainted with the Assistant as they went through the tests

Areas for improvement identified from the round 2 tests

Being able to move the Assistant around to uncover the interface

Ensuring the Assistant checked details with the user before doing

Reducing inconsistencies in wording and instructions

Assistant advising what they could do in the welcoming prompt

Avoiding insufficient detail in instructions yet not reverting to being too technical

What’s next: More testing and iteration

Leave a reply

Report this page