Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Will five users really find 85% of all usability problems?

Jeff Sauro • May 6, 2010

If you ask five users to take a look at a website or application you will find usability problems. If you fix those problems then ask another five users you will get another set of problems. Over time there will be fewer and fewer problems found, but a new set of users will still continue to find new problems. Why? Because each user is doing slightly different things with slightly different parts of your interface. Only certain combinations of functions and actions will reveal problems with the user experience (most problems aren't inherent to the code).

Most people can understand that there is a diminishing return with testing users. Fewer believe you can actually quantify the percent of problems found and so are dubious when they hear claims such as five users can detect 85% of problems.  There's good reason for the skepticism--they're right, you can never know the total number of problems (if you did you'd go and fix them). Instead, you can only quantify the percent of problems found given problems that affect a certain percent of your users given a specific set of tasks.

So after testing five users you have only found 85% of problems that affect 31% or more of your users given those tasks. The sample size computation based on the binomial works well given the condition you don't switch tasks, switch users or use open ended exploratory tasks. This is a limitation with the mathematical model, but every scientific model is an oversimplification of the real-world. That means all models are wrong but some are useful. The binomial model is useful because it is simple and familiar and works provided we don't try and overstate our results.

So open ended requests like "go shopping on the website for a few minutes" or "take a look at the site and tell us what you think" mean users are likely to encounter vastly different parts of your interface. Using this unfocused strategy would be like giving a survey to your users but changing some or all of the questions and answer choices while you're still collecting data (not recommended).

Not having defined tasks is like changing the questions in a survey while you're still collecting data.
Even if you have specific tasks you will still only uncover some of the problems (most of the obvious ones but few of the not-obvious ones). For example, if you ask five members of your subscriber base to add the same product to the shopping cart on your website you will see most of the obvious problems.  Don't be surprised if after three weeks someone complains about a problem with a field on your shopping cart.  You didn't find all problems with five users; you only found the obvious ones. So the problem this user reported is likely experienced by fewer than 31% of your users. Even if it only affects 1 out of 100 users if you can fix it you should, especially if it is a critical problem.

So is the five user heuristic even useful?  It is if you:
  1. Know who your users are
  2. Have users perform realistic closed-ended tasks with clear objectives (e.g. add a 40 inch Samsung Flat-Screen TV to the shopping cart).
  3. Know that with five users you have only identified 85% of the more obvious problems (those affecting more than a third of all users) and just a few of the less obvious problems.
  4. If you change the users or tasks you start over

If you decided you only had time to test your shopping cart, don't be surprised if you get complaints about your registration page, contact form or search screen—you didn't test these.
If you need to be sure you've found more than the more obvious problems then you need to test more than five users.
 If you need to be sure you've found more than just the more obvious problems then you need a larger sample size.

And even if you've diligently tested 37 users on the same closed ended task you will still see new problems. Why?  Because the problems being discovered are affecting a smaller and smaller percent of your users (less than 5%). On a website that gets thousands of visitors a day that means you'll see new problems not found in testing rather quickly and so testing with a larger sample size might be necessary.

When your run out of money, time and patience testing, know that there are still problems out there waiting to be encountered by your users. But have comfort knowing these problems are affecting a smaller and smaller percent of your users and move on to finding and fixing other parts of the application.



You Might Also Be Interested In:

Rate this Blog

Avg. Rating 7.21 (14)

Poor         Excellent
012345678910

.

Posted Comments

There are 2 Comments

July 28, 2011 | Stephen Wheeler wrote:

Interesting - but why try to uncover all problems by testing with more than five participants?

Let's assume that we run a study and five participants uncover 85% of the problems - the major ones, as you point out. Let's also assume we're using specific tasks representing the major tasks supported by the product (your shopping cart is a good example).

After our study, we refine or redesign the product to address those major issues. It seems to be highly likely that a) the minor issues we didn't uncover in the first study will go away with the new design and b) new design problems are created by the new design.r

So then we run another study with 5 more users, on this refined design. We'll still uncover issues (point b above), but they'll be increasingly minor. We'll also find that there's less overlap between participants as the problems become edge cases and not major flaws.

On a separate but not unrelated note, I think the debate over "how many users is enough" is partly the result of the fact that some UX researchers believe that when they are running usability studies they are "testing" a product to measure it in terms of numbers of problems, etc., whereas others (myself included) believe that running usability studies is a way to generate design insights on how to improve a product. 


May 11, 2010 | Jon Innes wrote:

Nice clarification Jeff.

With increasing numbers of teams doing Scrum I see them trying to do RITE, since the method is well suited to Agile. However, many of these folks don't realize that making big changes to protocol and/or designs after one or two users is pretty risky in most scenarios. RITE is easy to do wrong.

Unless you are certain the problem identified is going to impact many users I always advise teams to run ~5 users with stable designs/protocols. In the real world, the cost of running a few extra users is pretty low, especially when you consider all the costs and benefits.

I should add that it always amazes me that people fail to consider running studies with a longer term perspective. Why not run 5 now, 5 next week on the revised design, and so on? If you consider each of these a cell in a factorial design, you can see some interesting trends that require larger samples while still identifying major effects early on. 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 5 + 5: (enter the number)

.
.