Measuring Errors in the User Experience
Jeff Sauro • May 15, 2012
Errors happen and unintended actions are inevitable.
They are a common occurrence in usability tests and are the result of problems in an interface and imperfect human actions.
It is valuable to have some idea about what these are, how frequently they occur, and how severe their impact is.
First, what is an error?
Slips and Mistakes: Two Types of Errors
It can be helpful to categorize errors into slips and mistakes. Don Norman has written extensively about slips and mistakes in Chapter 5 of The Design of Everyday Things
Slips are the classic unintended action a user makes while trying to do something on an interface even though the goal is correct (e.g., a typo) .When the goal is wrong it's a mistake, even if that goal was accomplished.
Here are some slips
- Mistyping an email address
- Mistyping a password
- Accidentally clicking an adjacent link
- Clicking Reset instead of Submit (FYI don't have a Reset button on a form).
- Mistyping an email address in the "Re-Enter" email address field
- Picking the wrong month when making a reservation
- Accidentally double clicking a button (often with a double submitted form)
Here are some mistakes
- Clicking on a heading that isn't clickable
- Intentionally double clicking a link or button
- Typing both first and last name in the first name field
- Entering today's date instead of the date of birth
- Replying to all in an email instead of just one person (an especially egregious mistake if the email is inflammatory)
- Entering hyphens in your bank account number
- Pushing the gas pedal instead of a break in an accident. Not slipping (literally) but mistakenly pushing the gas when panicking. Update: This just happened in my city a few hours after I posted this blog.
Four Causes of Errors
When we observe errors in usability tests we find it helpful to identify their causes and find they generally fall into four broad categories.
- Slips: You can't eliminate all those "fat finger" errors or typos but seeing a lot of slips can be a good indication to reduce required fields or data entry where possible.
- Mistakes : When we see users entering the wrong format in a field it's usually a good indication that some field-hint, an auto format or some code that gracefully strips non-numeric characters might reduce these mistakes.
- User Interface Problems: Errors caused by the interface are the ones we're most interested in as we can usually do something about these. If users continue to click on a heading that's not clickable (mistake) or look for a product in the wrong part of the navigation then there's probably something about the design that we can improve.
- Scenario Errors: No matter how sophisticated and realistic our usability tests are, there is some degree of artificiality to them. For example, if you want to test how well users can pay a credit card bill online then you have to provide them with fake data and a test system. Inevitably we see errors related to the artificial scenario as users see balances and transactions that are foreign to them. We can't do much about these errors except note that they are unlikely to be encountered in actual use.
How to Record Errors
When observing users in a usability test, record every time an error occurs even if it is the same error by the same user on the same task. I've seen the same user try unsuccessfully to click on the same heading that wasn't clickable 5 times over a 2 minute period.
The user was confused about the navigation and really wanted that heading as a way to reorient themselves. Even though it was the same error, seeing 5 errors versus 1 error better describes the experience (which was poor).
Errors Provide the Why
Errors have been shown to correlate with the other prototypical usability metrics[pdf]
of task-time, completion rates and task-level satisfaction. Errors are often the "why" behind the longer task times, failed tasks and lower satisfaction ratings.
For example, in our evaluation
of the enterprise rent-a-car website, users were asked to find out the total price of the car with a GPS navigation system and car seat. These "extras" weren't added to the total price so users had to do the addition themselves (often incorrectly) or think the total was lower than it actually was—a mistake caused by a UI problem that increased times, led to task failure and lowered the ease ratings.
Simply averaging the number of errors by task gives you some idea about the experience. Showing 0 errors on a task does mean something compared to a task with 3.5 errors per task, especially if you are comparing different designs. However, not all tasks are created equally and that needs to be accounted for when interpreting errors.
Computing an Error Rate
Errors, unlike task completion rates, can occur more than once per user per task. This can complicate the analysis since you cannot easily compute a proportion as with task completion rates
. For example if a user committed 3 errors for 1 task you cannot just divide 3/1.
The simplest thing to do is treat errors as binary data and code the raw error counts as either 1's (user committed at least 1 error) or 0's (user committed no errors). This loses some information but for many tasks and applications which don't see many errors, this may be sufficient.
Opportunities for Errors
An alternative approach which retains all the information is to convert errors into a proportion based on the opportunity for errors. An opportunity for an error is a technique I borrowed from Six Sigma's opportunity for a defect. The idea is that some tasks, especially those that are longer or more complex, will have more opportunities for users to make mistakes. See also Human Error Probability (HEP).
For example, withdrawing money out of an ATM will have fewer error opportunities than submitting an expense report with 4 receipts and mileage in an Expense Reporting application. You identify the places in an interface where users can make mistakes and divide the total number of errors across all users by the total number of opportunities. For example, if a task has 5 opportunities for an error and 10 users attempt the task there are 50 opportunities. If you observe 5 errors across the users the error rate is 5/50 = 10%.
For more detail on the method of creating error opportunities see I have a whole section in A Practical Guide to Measuring Usability
as well as in the 2005 CHI[pdf]
and UPA papers[pdf]
Combine errors into a Single Usability Metric (SUM)
I don't always record errors, but when I do I like to include them into a Single Usability Metric (SUM)
. We found[pdf]
an average of time, completion rates, task-satisfaction and errors (all expressed as proportions) provides a great single measure for describing the usability of a task. It can be used on dashboards or when comparing competing products.
One of the major reasons for not collecting error data is that it's time consuming. We usually have multiple researchers counting and categorizing errors and it certainly can be time consuming and tedious.
If you're conducting a remote unmoderated test it can be difficult unless you have some record of the interaction. This is one of the reasons we use videos
in addition to our remote unmoderated data. The folks at Webnographer
also have a way of recording some types of errors automatically in their software.
We are all humans and to err is human. While we can't eliminate human error from task performance we can reduce it by removing as many opportunities for errors. A usable interface is one that, to as great an extent as possible, prevents errors and, when errors occur, helps users recover from them with as little pain as possible. With the proliferation of mobile devices that seem to be especially error prone, identifying and reducing as many errors as possible will lead to both increased usability and higher adoption.