Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Calculating Sample Size for Task Times (Continuous Method)

Jeff Sauro • September 17, 2004

We already saw how a manageable sample of users can provide meaningful data for discrete-binary data like task completion. With continuous data like task times, the sample size can be even smaller.

The continuous calculation is a bit more complicated and involves somewhat of a Catch-22. Most want to determine the sample size ahead of time, then perform the testing based on the results of the sample size calculation as in the binary sample calculation. In this case, we need to have some data already or at least a strong hypothesis of our user population.

As with the binary calculation for task completion, we know when testing experienced users (those who complete the task at least weekly) they should overwhelmingly complete the task successfully. With task times we should also have a rough estimate of the mean and standard deviation ( there's the Catch 22). If you're performing a benchmarking study and already have some data, then you can use that data. If you have time, sample a pretest of users, say four, to get a sense of the range in times. Of course when all else fails you can have some internal folks complete the tasks--perhaps some sales or service employees or whomever comes close to matching the speed and accuracy of you target users. You'll need to have an idea of the standard deviation(in seconds) for each task you're testing.

For example, lets use the sample task, "Looking up a balance on an account number" (a very common task in accounting software). You write up a scenario and try the task yourself and have three of you colleagues complete it. Chances are you're probably completing the task faster than your users, nevertheless it will still provide you a range of times. Here are the times in seconds

Time (in seconds)
You 101
Colleague 1 132
Colleague 2 125
Colleague 3 145
   
Mean 125.75
St Deviation 18.46
Range 44
From this pre-test sample you want to be able to derive as close an estimate as possible to the range in times of your actual users. To operationalize this, you would say "I want to be 95% confident of the mean time within ten seconds. So instead of simply asking, "How many users do I need to test?", you ask "How many users do I need to test to be 95% sure I know their mean task time within ten seconds?" Here's where the real statistics start.

That ten second range will become the confidence interval. The confidence interval is that + or - fudge factor seen with the polls on TV. With this confidence interval we can work backwards to arrive at our sample size. Because we don't know the standard deviation of the whole population of users(again the Catch 22) we need to estimate it from the small sample we have. For small samples (less than 30) where the parent standard deviation (σ) is not known you use what's called the student t distribution. The student t distribution uses values from a t table instead of the more familiar z table of normal values.

The confidence interval is calculated by multiplying this t-statistic (t*) by the Standard Error (SE). The Standard Error is just the sample standard deviation divided by the square root of the sample size. So the confidence interval formula usually looks something like this:



To arrive at the elusive "significant" sample size, you need to try a few reasonable sample sizes and see which ones fall within the limits of the confidence interval. The values (n) you choose will affect the the critical value for t and the Standard Error since both use n in their equation. We'll use 25, 20, 15, 10 and 5 and which ever value has a confidence interval at about 10 seconds we'll use as the ideal sample. (Again all this assumes that our internal sample did a good job of determining the standard deviation of the larger population).

 

At about 15 users, the conifdence interval narrows close enough to ten seconds that it will probably be sufficient. I'd use this 15 as the approximate number of users you'd need to sample and know that to get more precise, you'd need to sample more than 15 users. This result is much better than thinking you need to test 100 or 1000 in order to get "statistically significant results. If +/- 10 seconds isn't precise enough you can:

  1. Decrease your confidence level to 90% or 85%.
  2. Sample more users.
  3. Decrease your confidence interval and increase your sample.

 

Sample Sizes in the Real World of Usability Testing

If you've run enough usability tests, in many cases your sample size is usually determined ahead of time--that is, you know your budget and time frame and therefore approximately how many users you'll be sampling--usually somewhere between 10 and 30. I then approach sampling as getting as many users as I can within that range and then compute the statistics later.

For example, lets say we followed our initial indication and sampled 15 users (assuming our budget and time fit nicely with this figure). We had them complete the same task of looking up an account balance as our small internal employee sample. Here are the results next to our initial internal sample:

Real-Users Sample
Internal Sample
With this sample we can now estimate the true mean time of our population. Using the formula for the student t distribution:



n
s
t*

 

Plugging in the numbers, for the estimated mean of the total population of users on this task we get:

= 126.6 + or - 9.08

So when reporting the mean time for this task we would say, "We are 95% confident the mean time is between 117.5 seconds and 135.6 seconds." In this example, our original sample turned out to be a good estimate of the mean time and standard deviation but don't expect that to usually work out so well.


About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More

Related Topics

Sample Size
.

Posted Comments

There are 7 Comments

August 21, 2013 | Andreas Scherr wrote:

Is there a paper or book I can cite for this? 


March 21, 2013 | Scotty boomboom wrote:

How do you maths? 


February 27, 2012 | Ajit tkumar sahoo wrote:

very good for this type of statistical elaboration, which help students for basic clarification. 


September 2, 2010 | abee wrote:

anything 


November 28, 2009 | abey varughese wrote:

i want know about HOW TO CALCULATE SAMPLE SIZE FOR an experimental study to determine effectiveness of music therapy on sleep quality of older adult of selected oldage homes.
please help.. 


October 16, 2008 | AKHILESH wrote:

5. A telescope manufacturer wants its telescopes to have standard deviations in resolution to be significantly below 2 when focusing on objects 500 light-years away. When a telescope is used to focus on an object 500 light years away 30 times, the sample standard deviation turns out to be 1.46.
a. State explicit null and alternate hypotheses
b. Test your hypothesis at the =0.01 level. 


July 16, 2008 | Jackie Aylsworth wrote:

can you tell me what an application of split-half would be as well as the appropriateness (ie: when or when not to use it as well as strengths and weaknesses of split half? 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 3 + 4: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4336 Subscribers]

Connect With Us

Our Supporters

Loop11 Online Usabilty Testing

Usertesting.com

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Use Card Sorting to improve your IA

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.