Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Measuring User Interface Disasters

Jeff Sauro • December 14, 2011

Completion rates are the gateway metric.

If users can't complete tasks on a website, not much else matters.

The only thing worse than users failing a task is users failing a task and thinking they've completed it successfully.  This is a disaster.

The term was made popular by Gerry McGovern and disasters are anathema to websites and software.

Measuring Disasters

Have you ever bought the wrong product or found out later you got the wrong information like the value of a car, the shipping costs or the delivery time? If so then you know how both frustrating and inconvenient it can be.

The most effective way of measuring disasters is to collect binary completion rates (pass and fail) and ask users how confident they were they completed the task successfully.  I use a single 7 point item to measure confidence (1= Not at all confident and 7 = Extremely Confident) but you could use a 5, 9 or 11 point scale too.

Disasters are when users fail the task and yet report they are extremely confident they completed it successfully (e.g. 7's on a 7 point scale).  


How Common are Disasters?

Just how often do such disasters occur?  Well it certainly depends on the task and interface. I've used this method for several years now on software and websites. Across 174 tasks, 1200 users and 14 different websites and consumer software products the median disaster rate is 5%. I've seen as few as 0 disasters and as many as 30%.  The distribution of disasters by task is shown in the histogram below.



Figure 1: Frequency of disasters by task. Many tasks have no disaster experiences while a few have above 25%.

Task-Level Disasters

In addition to looking at disasters by the individual, we can also look at task-level experiences. By plotting success and confidence we can identify where users are misinterpreting cues and leading them to believe tasks were done correctly—when they weren't.  The graph below shows the 174 tasks graphed by success and confidence.


Figure 2: Task success and task confidence for 174 tasks across consumer software and websites.

Technical Note: The confidence score was created by dividing the mean confidence score by task and dividing it by the maximum score. So a mean score of 6 on a 7 point scale becomes 86%. Confidence scores were a bit inflated (as are most rating scales) as the average confidence by task was 74%. I transformed these percent confident scores so 74% became the center point.  The average task completion rate was also not 50%, but 66% (a bit lower than the 78% from the larger sample).  I also transformed completion rates so 66% became "average" at 50%.  These transformations tend to stretch the tasks out so we can better discriminate between good, mediocre and poor experiences.

The correlation between task completion and confidence is r = .67 and this relationship can be seen as the diagonal spread of the data. The bulk of the tasks fall within the upper right quadrant and lower left quadrants which I call good and poor experiences. Here users are completing tasks and know it or are failing tasks and generally know it.

Disasters fall in the upper left quadrant where task success is below average but confidence is above average. Task level disasters aren't rampant, but the 23 in this sample represent about 13% of the task experiences.  The lower right quadrant shows above average completion rates and below average confidence and I call these "unsure."  These tasks may fall below the radar because the completion rates are high, but the low confidence scores suggest problems are lurking and there's room for improvement.

Task failure is bad, but when users think they have the right information or did something they really didn't (like voting for the wrong candidate), then the user experience goes from bad to worse. Tracking both task success and confidence can help identify these disasters early and hopefully prevent further damage.


 

About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More


UX Bootcamp: Aug 20th-22nd in Denver, CO
Best Practices for Remote Usability Testing
The Science of Great Site Navigation: Online Card Sorting + Tree Testing Live Webinar


You Might Also Be Interested In:

Related Topics

UI Disasters, Confidence
.

Posted Comments

There are 2 Comments

December 21, 2011 | Jeff Sauro wrote:

Timo,

Thanks for the thoughtful comment.

My thought is that while there is a benefit in articulating "disasters" in an international standard, I'm also inclined to defer to a simpler, albeit a bit vaguer ISO definition. So while the ISO 9241 pt 11 definition says nothing about users acknowledging a long task time or failed task, we've found through empirical analysis that there is a correlation between the aspects of effectiveness, efficiency and satisfaction. In general the correlation is medium (r = .3-.5) between the metrics. A correlation at this level suggests you need to measure all 3 aspects to truly measure usability because one measure cannot replace another at these correlation levels (e.g. around 25% of lower satisfaction ratings are explained by failed tasks). But as soon as you take all three measures you have the opportunity to measure disasters and other mis-matches in experiences. You can also see the more common experience: when users fail tasks, take longer and generally think it is an unusable experience. So in short, I see a value in pointing this out to others, but like the simplicity the current definition offers. Perhaps there's a place for a corollary to the definition?
 


December 20, 2011 | Timo Jokela wrote:

"The only thing worse than users failing a task is users failing a task and thinking they've completed it successfully. This is a disaster."

Thanks Jeff for raising this issue. It is very interesting from the point of view of the definition of usability.

The thing that is interesting is that the standard definition of usability (ISO 9241-11: "The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use") does not cover this "disaster" setting at all. The definition covers effectiveness, but not perceived effectiveness.

Therefore, the definition of usability should extended as to: "The extent to which a product can be used by specified users to achieve specified goals with effectiveness, PERCEIVED EFFECTIVENESS,...".

The definition of usability is under discussion in the standardization working groups. I am involved, and I have proposed this kind modification to the definition. But the things are slow to change, and there are many opinions...

- BTW: same applies for efficiency; PERCEIVED EFFICIENCY is not covered in the ISO definition of usability.  


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 5 + 1: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4080 Subscribers]

Connect With Us

UX Bootcamp

Denver CO, Aug 20-22nd 2014

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Our Supporters

Use Card Sorting to improve your IA

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Usertesting.com

Loop11 Online Usabilty Testing

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.