Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Confidence Interval Calculator for a Completion Rate

Jeff Sauro • October 1, 2005

Use this calculator to calculate a confidence interval and best point estimate for an observed completion rate. This calculator provides the Adjusted Wald, Exact, Score and Wald intervals.

Input Table
PassedTotal Tested
Results Table
Confidence IntervalsPoint Estimates
 LowHighMargin of Error*   
Adj. Wald 
Exact 
Score 
Wald 
 

Download this calculator in an excel file

Explanation

The Adjusted Wald method should be used almost all the time. For exceptions, see below.
For a detailed discussion of binomial confidence intervals with small samples, see the HFES paper and for a discussion on the best point estimate see the JUS paper.

Adjusted Wald Method

The adjusted Wald interval (also called the modified Wald interval), provides the best coverage for the specified interval when samples are less than about 150. In other words, if you want a 95% confidence interval then this formula will produce an interval that will contain the observed proportion on AVERAGE about 95 percent of the time. It uses the Wald Formula but is "adjusted" in that it adds half of the squared Z-critical value to the numerator and the entire squared critical value to the denominator before computing the interval i.e (x+z2/2)/(n+z2). For example, a 95% confidence level uses the Z-critical value of 1.96 or approximately 2. If you observe 9 out of 10 users completing a task, this formula computes the proportion as( 9 + (1.962/2) )/ (10 + (1.962)) = approx. 11/14 and builds the interval using the Wald formula. Note: Prior to March 1st 2006, this calculator computed this interval by adding one z-value to the numerator and a squared z-value to the denominator.

Exact Method

The Exact method was designed to guarantee at least 95% coverage, whereas the approximate methods (adjusted Wald and Score) provide an average coverage of 95% only in the long run. Use the Exact method when you need to be sure you are calculating a 95% or greater interval - erring on the conservative side. For example, at the population completion rate of 97.8% both the Score and adjusted Wald methods had actual coverage that fell to 89%. When the risk of this level of actual coverage is inappropriate for an application, then the Exact method provides the necessary precision.

Score Method

The Score method provided coverage better than the Exact and Wald methods but falls short of the adjusted Wald method. Additionally, its drawback is its computational difficulty and its poor coverage for some values when the population completion rate is around 98% or 2%, regardless of sample size (Agresti and Coull, 1998). The only advantage in using the Score method is that it provides more precise endpoints when the ends of the intervals are close to 0 or 1. For some values (e.g. 9/10) the adjusted Wald's crude intervals go beyond 0 and 1 and a substitution of >.999 is used. For the score method, the upper interval is .9975.

Wald Method

The Wald method should be avoided if calculating confidence intervals for completion rates with sample sizes less than 100. Its coverage is too far from the nominal level to provide a reliable estimate of the population completion rate. As the sample size increases above 100, all four methods converge to similar intervals. Use the Wald as a point of reference or for larger sample sizes.

* The "Margin of Error" values are half the width of the Confidence Intervals. For the adjusted wald and wald formulas, you can use the proportion +/- the confidence interval. For the exact method, the intervals are not symmetrical as the proportion complete gets further from 50% (e.g. 90% or 15%). Therefore the margin of error should be only used at as an approximation for the exact method and the actual values above and below the proportion should be reported.

When All Users Pass or Fail

With small sample sizes, it is a common occurrence that all users in the sample will complete a task (100% completion rate) or all will fail the task (0% completion rate). For these scenarios, it is often unpalatable to report 100% or 0%. After all, how likely is it that the true population parameter is as extreme as 100% or 0%? The Best Estimate box provides the best point estimate under these conditions and uses the LaPlace method for calculation. While this value may seem too far from the observed 100%, its attractiveness is that it is a function of the sample size-- the greater the sample size, the closer this value will be to 100%.
Calculation Note: When the observed completion rate is 100% or 0% there cannot be a two sided confidence interval (since you cannot have more than 100% or less than 0%). In these cases it is necessary to use a z-critical value for a one-sided confidence interval. For example, a 95% two sided confidence interval uses the z-score of approximately 1.96, a one sided interval uses a z-score of approximately 1.64.

Likely Population Completion Rate

The two options in this drop-down:

Between .5 and 1
If you conduct usability tests in which your task completion rates are roughly restricted to the range of .5 to 1.0, then select "Between .5 and 1" in the drop-down. See the Best Estimates section below for how the point estimate is calculated with this option. Unknown
If your task completion rates typically take a wide range of values, uniformly distributed between 0 and 1, then select "Unknown" from the drop down. If you don't know either way then leave it at "Unknown." This selection will use the LaPlace method for the best estimate of the completion rate.

Point Estimates

Whereas a confidence interval describes a likely range or interval of values, a point estimate describes a single value- a point as an estimate of an unknown parameter in the population. The chance that the sample point estimate is the same as the unknown population completion rate is extremely unlikely. For that reason, you should always compute a confidence interval when reporting a completion rate. It is much more informative than a point estimate since it provides a reasonably likely boundary for the population completion rate.
Although it receives little attention in introductory statistics classes and has had little influence on measurement practices in the field of usability engineering, there is a rich history of alternative methods developed to achieve a more accurate point estimate of p than simply dividing the number of successes by the number of attempts (for example, see Chew, 1971; Laplace, 1812; Manning & Schutze, 1999). This need is most evident when there is an extreme outcome, specifically, when x=0 (0%) or x=n (100%) - especially, but not exclusively, when sample sizes are small. Four estimation methods that pertain to situations more common in usability testing are detailed below:

MLE:(Maximum Likelihood Estimate)(x / n)

The MLE is the sample proportion or the number of users succeeding divided by the total attempting. It is the most common point estimate reported.

LaPlace (x+1)/(n+2)

A famous large-sample problem comes from the seminal work of Laplace in the early 1800s. He posed the question of how certain you can be that the sun will rise tomorrow, given that you know that it has risen every day for the past 5000 years (1,825,000 days). You can be pretty sure that it will rise, but you can't be absolutely sure. The sun might explode, or a large asteroid might smash the Earth into pieces. In response to this question, he proposed the Laplace Law of Succession, which is to add one to the numerator and two to the denominator ((x+1)/(n+2)). Applying this procedure, you'd be 99.999945% sure that the sun will rise tomorrow - close to 100%, but slightly backed away from that extreme. The magnitude of the adjustment is greater when sample sizes are small. For example, if you observe two out of two successes and apply the LaPlace procedure, then your estimate of p is 75% (x+1=3, n+2=4, p=3/4) rather than 100%. If you had observed two failures, then your estimate of p is 25% (x+1=1, n+2=4, p=1/4) rather than 0%. LaPlace in essence is saying, the next result is a toss up, so give each alternative an equally likely chance of occurring.

Wilson (x+z2/2)/(n+z2)

Wilson's point estimate is the midpoint of the adjusted wald interval. It is derived by adding half a squared critical value to the numerator and a squared critical value to the denominator. Wilson's is the more conservative approach.

Jeffreys (x+.5)/(n+1)

Jeffreys (1961) provided a compromise between the LaPlace and MLE methods. See reference for technical details.

Best Estimate

The best point estimate is calculated using the following logic: If "Unknown" is selected from the Likely Population Completion Rate drop-down, the LaPlace method is used. The smaller your sample size and the farther your initial estimate of p is from .5, the greater the benefit over the MLE.

If "Between .5 and 1" is selected from the Likely Population Completion Rate drop-down and the observed completion rate is:

  1. Less than or equal to .5: the Wilson method is used.
  2. Between .5 and .9: the MLE is used.
  3. Greater than .9: the LaPlace method is used (Note, if 1 > x > .9 the Jefferys method is also a viable alternative).
Need more information? Be sure to check out the online confidence interval tutorial.

References

  1. Agresti, A., and Coull, B. (1998). Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician, 52, 119-126.

  2. Chew, V. (1971). Point estimation of the parameter of the binomial distribution. The American Statistician, 25, 47-50.

  3. Jeffreys, H (1961) Theory of Probability (3rd Ed), Clarendon Press, Oxford pp. 179-192.

  4. Laplace, P. S. (1812). Theorie analytique des probabilitites. Paris, France: Courcier.

  5. Lewis, J.R. & Sauro, J. (2006) "When 100% Really Isn't 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates" in Journal of Usability Studies Issue 3, Vol. 1, May 2006, pp. 136-150

  6. Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

  7. Sauro, J & Lewis, J R (2005) " Estimating Completion Rates from Small Samples using Binomial Confidence Intervals: Comparisons and Recommendations" in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (HFES 2005) Orlando, FL


  8. About Jeff Sauro

    Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
    He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
    More about Jeff...


    Learn More


    UX Bootcamp: Aug 20th-22nd in Denver, CO
    Best Practices for Remote Usability Testing
    The Science of Great Site Navigation: Online Card Sorting + Tree Testing Live Webinar
    .

    Posted Comments

    There are 21 Comments

    July 1, 2013 | Milton wrote:

    I had realized a calc in Confidence Interval Calculator For A Completion Rate for the follows values passed=780 Total=5662 but for exact the result in low score in very different to respect other test 


    June 2, 2013 | Marina Treneva wrote:

    Thank you for the explanation of CI variations. Your calculator is in work with denominators more than 2000. The QuickCalk in Graphpad is not valid for the denominator more than 2000. 


    February 4, 2013 | Katie wrote:

    Can you use this calculator for finding confidence intervals for multiple choice (radio or check box) survey questions?

    For example, how many people said they would vote for Candidate A, B, or C? Or how many people in the past month have recently visited sites A, B, or C?

    Thanks!  


    January 23, 2013 | Chandrasekhar wrote:

    A sound mathematical reasoning with simple examples and description. 


    May 11, 2012 | Martin Raic wrote:

    As to the case when none or all of the trials succeed, I agree with Charles Bedard. In particular, the exact interval then fails to guarantee the 95% coverage. For example, in the case of 5 trials, the calculator yields [0, 0.4507) for no success and (0.5493, 1] for 5 successes. Therefore, if the actual success probability equals 1/2, the coverage probability equals Bin(5, 1/2){1,2,3,4} = 0.9375.

    The point: when considering one or two sides, the actual and the estimated success probability should not be confused. 


    July 7, 2011 | Michelle wrote:

    I like this 


    June 22, 2011 | Mikael Goldstein wrote:

    Wen 8 passed (out of 10 (p=0.8)) Laplace turned out to be the best point estimate, when in fact it should med MLE!
    When 4 out of 10 passed LaPlace turned out to be the best pe, when in fact it should be Wilson!
    for 9 out of 10, LaPlace is OK.
    For 10 out of 10, LaPlace is OK

    In your paper The Wilson Method is displayed as (x+2)/(n+4) but the computations are done with the x + c square estimator. Which one is the correct to use?

    confidence level uses the Z-critical value of 1.96 or approximately 2. It fells odd to use the z value when dealing with small samples?

    Regards,
    Mikael Goldstein
     


    January 3, 2011 | MaN wrote:

    Great stuff, easy and handy 


    December 10, 2010 | williamkinney wrote:

    helpful 


    December 3, 2010 | Monica wrote:

    Thank you very much, this is helpful! Could you also provide the formulas used for each? 


    October 11, 2010 | Nestor Garcia wrote:

    Excellent summary of information related to confidence interval.
    I have used widely the exact method to support risk analysis 


    February 22, 2010 | anonymous wrote:

    very easy to use. 


    November 6, 2009 | Jim Hodges wrote:

    Which exact method is your exact ;method? I can't find it here now, but I recall being able to find it on a previous visit to this page. Your link to the confidence interval tutorial is dead. 


    August 10, 2009 | Greg wrote:

    would you use the laplace interval for fast-time modeling results that yield 0 "successes" out of 5 million runs (treating the 5 million runs as a sample)? 


    June 3, 2009 | B Joseph wrote:

    In 1992, the FAA conducted 86,991 pre-employment drug tests on job applicants who were to be engaged in safety and security-related jobs, and found that 1,143 were positive. (a) Construct a 95 percent confidence interval for the population proportion of positive drug tests. (b) Why is the normality assumption not a problem, despite the very small value of p 


    May 25, 2009 | Sujan Karki wrote:

    I want to calculate confidence intervel of cluster sample. How to use this calculator for CI for cluster effect? any modification or can not use this calculator?
    thanks
    sujan 


    April 4, 2009 | sammy wrote:

    4nWrDb vkoo7wvY5Xkfak7bf1Th 


    April 4, 2009 | sammy wrote:

    4nWrDb vkoo7wvY5Xkfak7bf1Th 


    February 8, 2009 | Alexandre miranda wrote:

    cant find how to calculate the exercise on page 66 -confidence interval based on binomial distribution- (figure 4.1) 


    May 20, 2008 | Charles Bedard wrote:

    Not sure if my comment went throug. Instead of 2 as an answer to the question "What is 1+1", I entered 1.999999..... , which is mathematicaly equivilent. My joke. to repeat my comments.
    ----------------------------------------------
    I find that all the estimators have one fatal flaw. A two sided confidence interval is specified with the presumtion that the error in each tail is alpha/2. When the number of successes is equal to zero or the number of trials, all the stated CI's take either 0 or 1 as one end of the CI and put ALL the error in the inside tail, making the CI a one sided confidence interval with alpha (not alpha/s) in the tail. I prefer a modified Agresti CI (using an unassumed prior to keep frequentests happy or a simple uniform over .5 to 1 (or .5 to 0) The modified Agresti CI is based on the Beta distribution since the distribution of the proportion is a continuous distribution. More honest, especialy in one-shot (non-production) situations. 


    May 14, 2008 | Pieter Johnson wrote:

    This website is excellent! Very helpful. 


    Post a Comment

    Comment:


    Your Name:


    Your Email Address:


    .

    To prevent comment spam, please answer the following :
    What is 2 + 1: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[3813 Subscribers]

Connect With Us

UX Bootcamp

Denver CO, Aug 20-22nd 2014

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Our Supporters

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Loop11 Online Usabilty Testing

Usertesting.com

Use Card Sorting to improve your IA

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.