Jeff Sauro • March 7, 2012

There are many reasons why usability professionals don't use statistics and I've heard most of them.Many of the reasons are based on misconceptions about what you can and can't do with statistics and the advantage they provide in reducing uncertainly and clarifying our recommendations.

Here are nine of the more common misconceptions.

For example, in an early design test of a new homepage design for a large ecommerce website I had the same 13 users attempt 11 tasks on a mockup of the existing homepage and the new design. Four of the 11 tasks had statistically different task times (see Figure 1 below).

Figure 1: Difference in tasks times for 11 tasks on a new vs existing homepage design. Error bars are 80% confidence intervals.

They were statistically different because users were able to complete three of the tasks more quickly on the old design and one task more quickly on the new design. Even tasks that showed no significant difference provided meaning, even with a major design change, the bulk of tasks are being completed in about the same amount of time.

Technical Note: Don't think you always need to use 95% confidence intervals or consider something significant only when the p-value is less than .05, this is a convention that many publications use. For applied research you should more evenly balance the priority you give to Type I and Type II errors. See Chapter 9 in Quantifying the User Experience.

If a user trips on a carpet, how many users do you really need to test, much less quantify? (Quote attributed to Jared Spool).

For example, the first of 13 users we tested on the design of the new homepage had no idea what "Labs" meant (it was referring to upcoming products). Should we change it to something else? Is this a carpet trip or carpet uncomfortable issue? After testing all 13 users only one other user had the problem with the term (2 out of 13 in total). Now should we change it? Understanding the prevalence of this issue can help make a more informed decision and is the subject of the next misconception.

Reality: Even if all you collect is a list of usability problems in formative evaluations you can still estimate the prevalence of the problems you observed by providing confidence intervals.

For example, for the two out of 13 users that had a problem associating "labs" with new and upcoming products we can be 90% confident between 4% and 38% of all users would also have some problem understanding the label. There might be a better term, but given the data, at most 38% of users would have some difficulty making the association.

For example, in a test of a new homepage design I had the same 13 users attempt the same 11 tasks while thinking aloud on a mockup of the existing homepage and a new design (this is the data shown in Figure 1 above). The average task time was statistically faster on one task and statistically slower on three tasks (addressing misconceptions 1,2 and 4). Even though the users were thinking aloud, they acted as their own control. Loquacious users were talkative on both the old and new version and reticent users were quiet on both versions.

Reality: It's always important to know your audience. Just because you use statistical calculations doesn't mean you need to bore or confuse your audience with detailed calculations and figures. Even though I advocate a use of statistics doesn't mean I start every conversation with z-scores.

Often adding error bars to graphs or asterisks to means can allow the audience to differentiate between sampling error and meaningful differences. If Consumer Reports and TV news can provide information about confidence intervals (usually called margin of errors) so can you.

For example, I presented Figure 1 in a presentation to illustrate the difference in task times. During the presentation a Vice President quipped: "I can't believe you're showing me confidence intervals on a sample size of 13" (misconception #1).

In response I pointed out that even at this sample size we are seeing significant differences, some better and some worse and that confidence intervals are actually more informative for small sample sizes. With large sample sizes the differences are often significant, but the size of the difference is often modest and unnoticeable to the users.

The misconception is that somehow statistics replace descriptions of usability problems. It's not statistics OR qualitative problem descriptions, it's statistics AND qualitative problem descriptions.

For example, the graph below shows the difference in confidence ratings for two tasks on the homepage comparison test discussed above. The value shown is the average confidence rating for the new design minus the old design (so higher values favor the new design). Both tasks show that users were more confident on the new design than the old because the mean difference is greater than zero. But are the scores just a by-product of small sample sizes?

Figure 2: Difference in average confidence ratings for two tasks (New Design-Old Design). Higher numbers indicate more confidence on completing tasks for the new design.

The next graph shows the same tasks with 90% confidence intervals around the mean difference in confidence ratings. Only Task 1's error bars do not cross zero and show that the higher confidence was statistically significant.

Figure 3: Same values shown as in Figure 2 now with 90% confidence intervals. The confidence interval for Task 1 doesn't cross zero showing that users are statistically more confident on the new design.

There are free calculators, books and tutorials to get you started improving the rigor of your usability test. The first lesson is that statistics and usability analysis are a natural fit for making a quantifiably better user experience.

The Experiment Requires That You Continue: On The Ethical Treatment of Users

28 Resources for Getting Started In UX

5 Examples of Quantifying Qualitative Data

How common are usability problems?

Confidence Interval Calculator for a Completion Rate

10 Things to Know about Usability Problems

Nine misconceptions about statistics and usability

A Brief History of the Magic Number 5 in Usability Testing

Why you only need to test with five users (explained)

Does better usability increase customer loyalty?

8 Ways to Show Design Changes Improved the User Experience

The Five Most Influential Papers in Usability

97 Things to Know about Usability

Should you use 5 or 7 point scales?

.

Quantifying the User Experience: Practical Statistics for User ResearchThe most comprehensive statistical resource for UX Professionals Buy on Amazon | |

Excel & R Companion to Quantifying the User ExperienceDetailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R Buy on Amazon | Download | |

A Practical Guide to the System Usability ScaleBackground, Benchmarks & Best Practices for the most popular usability questionnaire Buy on Amazon | Download | |

A Practical Guide to Measuring Usability72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software Buy on Amazon | Download |

.

.

.