How well can users predict task-level usability?
Jeff Sauro • March 15, 2011
Ask a user to complete a task and they can tell you how difficult it was to complete.
But can a user tell you how difficult the task will be without even attempting it?
It turns out the task description reveals much of the task's complexity, so users can predict actual task ease and difficulty reasonably well.
The gap in expectations can be a powerful predictor of usability problems--something recently seen on eBay.
In a usability test users are asked to complete a set of representative tasks. By observing users attempting these tasks you can obtain a wealth of information on interaction strengths and weaknesses.
Watching just one user painfully struggle through a common-task can make a lasting impression on developers and product managers and makes for a persuasive highlight-video
What is largely a qualitative activity (looking for and describing design problems) is easily quantified by collecting simple pass/fail metrics and reporting completion rates.
Asking just a single question
on how difficult users thought a task was to complete is also valuable. By improving the mean rating over time you can show how designs improved the user experience
. Such subjective measures are especially helpful when you already have high completion rates. You can't improve a 100% completion rate, but you can improve a task that users struggle to complete and thought was difficult.
Predicting Task Difficulty
Writing a good task scenario takes some practice: don't lead the users, don't make it impossible and have some predefined and specific success criteria (see Dumas & Redish chapter 12
). When you ask a user to attempt a task, they quickly interpret what they're asked to do and have some idea about how difficult it will be.
For example, if I were to ask you to compute your Adjusted Gross Income after accounting for deductions using some IRS forms and tax-tables, you'd probably expect that to be more difficult than finding the hours of a local department store online.
Having written hundreds of task scenarios and
asking thousands of users to attempt them, I wondered how much of the difficulty is baked-into the task scenario. How accurate would ratings be if I just asked users how difficult they think a task is without actually testing them?
In a study reported by Albert & Dixon (Is This What You Expected? The Use of Expectation Measures in Usability Testing 2003) users rated how difficult they expected a task to be on a 7-point scale (just like the SEQ
) and then attempted the task and rated how difficult they thought it was using the same 7-point scale.
The difference between expected difficulty and actual difficulty provide some interesting insights. Tasks which were harder than expected are good candidates for improvement. Tasks that were better than expected could be promoted.
I like this approach but I need to modify it a bit for this research. One problem I see by having the same user both predict and rate the difficulty is the possible introduction of bias. For example, users who recall their earlier rating might have a desire to be consistent when they make their second rating. To eliminate the possibility of this bias, I used different sets of users (a between-subjects approach).
I had one group of users rate how difficult they'd think a set of tasks would be. I then had another set of users actually attempt the tasks then rate how difficult they thought they were. I picked a mix of tasks with a range of difficulty and used some well known websites (Craigslist.com, Apple.com, Amazon.com , eBay.com and CrateandBarrel.com) as shown in the table below.
|Label ||Task Description |
| Amazon DVDs ||Find the cost to purchase and deliver 100 DVD's using next-day on Amazon.com to customers across the US. |
|Apple Cheapest iPad||Find the lowest price iPad Apple.com. |
| Apple Error Msg ||Find the cause of the error message 'Connect to iTunes' on an iPad and find a possible solution on the Apple.com website. |
| Craigslist Post job ||Find the costs to post a programming job in 3 categories for 2 months on Craigslist. |
| Craigslist Find apt ||Find an apartment in San Francisco on Craigslist with 2 bedrooms for under $2000/month in rent. |
| eBay Seller Fees ||Estimate how much it will cost in commissions and fees to sell your iPhone 3GS on eBay. |
| eBay Find item ||Determine whether there is a copy of Camtasia Studio 7 for sale on eBay. |
| Crate &Barrel Hours ||Find out if a Crate & Barrel store in Denver Colorado (zip code 80210) is open on Sunday. |
I had between 30 and 40 people rate how difficult they thought each task would be, then I had separate groups of users (between 11 and 16 per task) attempt the tasks on the website.
So how well accurate were the predictions ? The average absolute deviation from the actual rating was 17% across the tasks. I was surprised how close the prediction came to the actual ratings. On four of the tasks the difference was less than 10%.Figure 1: Comparison of predicted (blue bars) versus actual ease of use ratings (red bars) . Tasks with * indicate significant differences at p <.05.
The most notable miss was where users over-predicted the difficulty of the "Craigslist Find apt" task by 50%. For some reason people thought this would be rather difficult. I wondered if it had to do with people being less familiar with the SF rental market. In looking at the data, people outside of California did rate the task as more difficult, but even California residents thought finding an apartment on Craigslist would be more difficult than it was.
Predicting eBay's Listing Fees Change
In general users tended to over-predict how difficult tasks would be (on 7 out of 8 tasks). The one task that was more difficult than expected was the "eBay Seller fees" task. While I think most people expect to pay fees to sell something on eBay, I think they expected the fee structure to be more straightforward. Part of the difficulty in the task is because there are multiple variables (such as total sale price, shipping costs and the type of merchandise).
However, these users apparently aren't unique in thinking the task was too difficult as eBay just announced
an improvement in their pricing structure. This simple question taken a few weeks before eBay's pricing change shows how powerful and predictive expectation measures can be!
To understand how much the predicted score could explain the actual score I conducted a simple linear regression at the task level. Half of the variation in task difficulty can be explained by how a different set of users think the task will be
= 50.8%). The scatter-plot below shows this strong association. The Craigslist and eBay tasks are highlighted and their departure from the trend-line show how they differed from expectations. Figure 2: Relationship between predicted task-ease and actual task ease (adjusted R-Squared = 50.8%).
With more tasks (especially those that missed expectations) it is likely that the ability for users to predict the tasks would go down. However, this data suggests a good portion of the perception of usability (and lack-thereof) is contained in the task-scenario.
As much as half the task-level ratings can be explained just by the inherent difficulty of the task-scenario and not the interaction with the website. Users likely over estimate the difficulty of a task and will rate it lower than expected only when they encounter usability problems.
Despite the strong association between predicted and actual ratings, we still need to have users attempt tasks to identify where misaligned expectations exist. In the absense of a good benchmark or a comparative test, using expectation ratings can be useful in diagnosing potential interaction problems beyond what completion rates and UI problem counts can tell us.