Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

10 Things to know about Confidence Intervals

Jeff Sauro • March 21, 2012

You don't need a PhD in statistics to understand and use confidence intervals.

Because we almost always sample a fraction of the users from a larger population, there is uncertainty in our estimates.

Confidence intervals are an excellent way of understanding the role of sampling error in the averages and percentages that are ubiquitous in user research.

  1. Confidence intervals tell you the most likely range of the unknown population average or percentage such as the average completion rate, the average level of satisfaction or the percentage of users likely to recommend your product. Note: Confidence intervals don't tell you the likely range of all values, just how much the average value is likely to fluctuate.

  2. Confidence intervals provide both the location and precision of a measure: For example, the graphs below show the 95% confidence intervals around the average System Usability Scale (SUS) score.  The only difference between the graphs is the sample size. Even though they both show the same location (a mean score of 83), the one on the right has a larger sample size (60), making it a more precise estimate of the population SUS score than the sample with only 12. 



  3. Three things impact the width of a confidence interval  
    1. Confidence level: This is the 95% part of the 95% confidence interval and also typically takes values of 90%, 99%, 80% and 85%. Confidence levels are the "advertised coverage" of a confidence interval. If we were to sample from the same user population 100 times, we'd expect the average to fall within the interval 95, 90 etc.,  times out of 100.

    2. Variability: as measured by the standard deviation. Populations (and samples) with more variability generate wider confidence intervals.

    3. Sample Size: Smaller sample sizes generate wider intervals. There is an inverse square root relationship between confidence intervals and sample sizes. If you want to cut your margin of error in half, you need to approximately quadruple your sample size.

  4. The confidence interval is equal to two margins of errors and a margin of error is equal to about 2 standard errors (for 95% confidence). A standard error is the standard deviation divided by the square root of the sample size. The standard error is standard deviation of the sampling distribution of the mean, or in English, it's basically the amount we expect the sample mean to fluctuate for a given sample size due to random sampling error.




  5. For rating scale data (like the SUS, SUPR-Q or SEQ) use the t-confidence interval method. You just need to know the mean, standard deviation, sample size and confidence level or you can use the online calculator to compute an accurate interval at any sample size.

  6. For confidence intervals using task times you should perform a log transformation on the raw values, and then compute the t- interval method. This method corrects for the skew in task time data to generate accurate intervals. Use the online calculator to perform the calculations.

  7. For binary data coded as 1's and 0's (Pass/Fail, Yes/No) use the adjusted wald method which provides accurate intervals at any sample size. It works by "adjusting" the proportion by adding two successes and two failures and generates accurate intervals for any sample size.

  8. Confidence Intervals can be computed on sample sizes as small as two: The intervals will be very wide but there's nothing with the math preventing you form computing them. With small sample sizes you can show that an interface is unusable, but it's harder to show it's usable. For example, if 0 out of 2 people can complete a task, there's only about a 5% chance more than half of all users will.

  9. We are 95% confident in the method of computing confidence intervals, not in any given interval. Any given interval we compute from sample data may or may not contain the population average. Even though 95% of samples will contain the interval we don't know if the one sample we took is one of the 5% that don't contain the average.

  10. You can use the overlap in confidence intervals as a quick way to check for statistical significance. If the intervals do not overlap then you can be at least 95% confident there is a difference (for 95% confidence intervals). If there is a large overlap, then the difference is not significant (at the p <.05 level). The intervals can actually overlap by as much as 25% and still be statistically significant, so when there is some overlap, it's best to conduct the 2-sample t-test and find the p-value. The three graphs below illustrate this.

         
     No Overlap: Significantly different (p <.05)
     Lots of Overlap: Not Significantly different ( p >.05)
     Some Overlap: Can't tell, run the 2-sample t-test (turns out p =.04, go publish!)


We have a chapter dedicated to confidence intervals in our book Quantifying the User Experience (Chapter 3) and in the Companion Book (Chapter 3) which contains step-by-step instructions for computing the interval in R or the Excel statistics package.

I'll be offering a tutorial which will include confidence intervals this fall at the Denver UX Boot Camp.



About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More



You Might Also Be Interested In:

Related Topics

Statistics, Confidence Intervals
.

Posted Comments

There are 2 Comments

March 22, 2012 | Jeff Sauro wrote:

You're exactly right Michael, thanks for articulating that. It is much more likely that the true population mean will fall in the middle of the interval (near the average) than near the ends—it’s a sideways normal curve.  


March 21, 2012 | Michael Zuschlag wrote:

#11 (building on #4). The further you get from the middle of the confidence interval, the less likely the real average/percentage is there. I find some people tend to think of a confidence interval as a uniform probability distribution. I try to remind them that the 95% confidence interval is the _reasonably plausible_ range of values for the average/percentage. The average/percentage is _probably_ (i.e., usually) in a range about half as wide. 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 1 + 1: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4212 Subscribers]

Connect With Us

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.