Jeff Sauro • April 21, 2010

How long does it take users to complete a task? We really don't know. Instead we have to take our best guess from a sample of users. But if you had to pick a single number to summarize how long it would take typical users to complete a task from a usability test what would you report? The mean? The median? The mode? Something else?When we want one number to represent the most common or typical value we often use the average, or more specifically, the arithmetic mean. Any single estimate from a sample (especially a small sample) will almost surely be wrong, so it is important to include confidence intervals around your best guess. Because summary data often finds its way on dashboards and reports to managers, we need to come up with our best guess of the typical task completion time.

The mean works quite well to provide the center value or most "typical" value in a set of data that is roughly symmetrical. When the data become skewed by really large values, the mean is pulled upward. This happens when summarizing financial data like average home prices or average salaries. One really expensive home or the chief executive will pull the average way up. In these cases, the median is used to provide a more accurate picture of the middle or most typical value.

The geometric mean provides the most accurate measure of the middle task time for sample sizes less than 25

You can see the positive skew (tail points to the right) in the figure below. Figure 1 shows a histogram of the task times from 190 users who all completed a task on an intranet application. Notice how the mean is higher than the median.

Download an excel calculator to compute the geometric mean and confidence intervals.

Figure 1: A histogram of 190 completed tasks times showing the effect of a positive skew on the mean. The median of this task is 71 seconds and the mean is 85 seconds. The median is the point where half the users take more and half take less time. |

At small sample sizes the median tends to overstate the actual middle time by as much as 10%

One average we didn't test was the mode (the most frequent value). The mode doesn't make a good average for task times since task time data can take on so many distinct values. The mode is often undefined (all unique values), or there are multiple modes (two duplicate values) or worse the mode comes from two task times far from the center.

To test the best average, we ran a Monte Carlo simulation on 61 large sample usability tasks and found that on average the geometric mean estimated the middle value of the population best and had the least bias (was just as likely to over and under estimate the median). For samples sizes less than 25 the geometric mean is the winner.

- Click the button to draw a small sample from the large sample task shown in Figure 1. The median (middle value) of this task is 71 seconds.
- With each click a new the sample median and geometric mean are computed and the amount of bias and error is calculated over time.
- For example, a random sample of five times (36,60,81,92,105) generated a median of 81 seconds and a geometric mean of 70.1 seconds. The median was off by 10 seconds (14%) and the geometric mean was off by .9 seconds (1.3%).
- When done several thousand times across 61 tasks from our database and all sample sizes between 2 and 25 the geometric mean has 13% less error and 23% less bias than the sample median.

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Should the Net Promoter Score Go? 5 Common Criticisms Examined

5 Variables to Manage in a Comparative Usability Study

8 Ways to Show Design Changes Improved the User Experience

10 Things to Know about Usability Problems

Why you only need to test with five users (explained)

How common are usability problems?

Does better usability increase customer loyalty?

5 Examples of Quantifying Qualitative Data

The Five Most Influential Papers in Usability

97 Things to Know about Usability

Should you use 5 or 7 point scales?

What five users can tell you that 5000 cannot

A Brief History of the Magic Number 5 in Usability Testing

Confidence Interval Calculator for a Completion Rate

Nine misconceptions about statistics and usability

.

Quantifying the User Experience: Practical Statistics for User ResearchThe most comprehensive statistical resource for UX Professionals Buy on Amazon | |

Excel & R Companion to Quantifying the User ExperienceDetailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R Buy on Amazon | Download | |

A Practical Guide to the System Usability ScaleBackground, Benchmarks & Best Practices for the most popular usability questionnaire Buy on Amazon | Download | |

A Practical Guide to Measuring Usability72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software Buy on Amazon | Download |

.

.

.