Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

6 Proportions to Compare when Improving the User Experience

Jeff Sauro • April 17, 2012

One of the simplest ways to measure any event is a binary metric coded as a 1 or 0.

Such a metric represents the presence or absence of just about anything of interest: Yes/ No,  Pass/ Fail, Purchase/No Purchase, On/Off.

Fundamentally the binary system is at the heart of computing as we know it. It also plays a critical role in user research.

Binary measures can even apply to traditionally qualitative research such as noting whether a user mentions distrust of sales people or marketing materials in an interview.

By adding up all the 1's and dividing by the total number observed you get a proportion. So if 5 out of 10 users mention distrusting marketing materials in an interview you get a proportion of 5/10 = .5 which is often expressed as a percentage 50%.

You can use proportions to help make data driven decisions just about anywhere: Which design converts more? Which product is preferred? Does the new interface have a higher completion rate? What proportion of users had a problem registering?

When it comes to comparing independent proportions you'll need to conduct a statistical test called the 2 proportion test (which is equivalent to the Chi-Square test).

We talk a lot about comparing two proportions in our book Quantifying the User Experience (Chapter 5) and recommend a slight adjustment to the typical formulas you might be familiar with so it works for small and large sample sizes.

To make it easier to compare proportions, I've created a simple online web-calculator which will do the statistical calculations for you. Just enter two proportions to see if the difference between them is more likely due to chance or more likely a legitimate difference.

Here are 6 examples of proportions and the statistical results to get you thinking:
  1. Completion Rates: If 11 out 12 users complete a task on Design A and only 5 out of 10 can complete the same task on Design B, then we can be 97% confident more users can complete the task on Design A.



  2. Conversion Rates:  A large blue button was shown to 455 users and 37 (8%) purchased a product. A large red button was shown to 438 different users and 22 purchased the product (5%). There is a 94% chance the blue button will sell more products.

  3. Problem Occurrence: 4 out of 7 users received at least one error message when entering alerts and notifications into their profile on a credit card website. After a redesign, 1 out of 7 had at least one error. There is an 89% chance the number of errors has been reduced when setting account alerts.

  4. Proportion Recommending: 89 out of 100 (93%) customers said they recommended Smart Phone A to a friend in the last year compared to 67 out of 93 (72%) for Smart Phone B.  There is a 99.7% chance this retroactive recommend rate is different.

  5. Proportion detracting: Prior to the change in the return policy, 49 out of 100 (49%) customers surveyed were detractors. After the change in policy 40 out of 96 (42%) were.  There is about a 69% chance the difference is not due to chance (good, but not overwhelming evidence).

  6. Proportion that completed a task in less than 30 seconds: 4 out of 9 users could add a new contact in CRM application A in less than 30 seconds. Eleven out of 12 could on CRM application B. There is a 97% chance if we tested all users, more would complete the tasks on App B.




About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More



You Might Also Be Interested In:

Related Topics

Statistics, A/B Testing
.

Posted Comments

There are 6 Comments

May 29, 2012 | Aniko Sandor wrote:

Jeff,

I meant p=0.4 in the example.
Aniko 


May 29, 2012 | Aniko Sandor wrote:

Jeff,
I understand your point that sometimes the audience is not trained in statistics and you still need to explain them your results. However, misinterpreting the p value may not be the best way to go. Perhaps p values need not to be presented in that case. Maybe the results could be explained the way you put it at the end of your comment: the difference found is not due to chance, therefore one design is better than the other. Think about a situation when the p value is .04. Would you still say you are 60% confident when the chance of a real difference is really small?
I admit I donít have a better approach to this question, perhaps it is audience-dependent and sometimes you have to compromise.
Aniko 


May 10, 2012 | Jeff Sauro wrote:

Aniko

Thanks for the comment. Strictly speaking you are correct. The p-value speaks to the probability of obtaining a difference that large or larger if there really was no difference. However, I'm afraid this subtlety gets lost on a less technical audience, or worse, gets ignored entirely. Using confidence in this sense while less precise I find is both more interpretable and actionable. It also happens to be the lexicon adapted by the A/B testing and SEO communities (e.g. Google Analytics).

Saying you can be (1-p)*100% confident in a difference is something like short hand for saying "now that I know the observed significance level, I can say that if I had selected (1-p)*100% for the confidence level, the interval would have excluded 0"). For the more statistically inclined we might wince a little but the gist of the exercise has been communicated and digested: it's unlikely that the difference is due to chance.
 


May 10, 2012 | Aniko Sandor wrote:

Jeff,
On example 1 the interpretation of the p value seems inappropriate. A p value (or 1 - p) cannot be interpreted as confidence (97% in this case) in results. It only means that a difference like the one you found between your samples is likely to happen only in 3% of the cases if they are from the same population. Therefore, it is more likely that they are from different populations, thus your results are significant. 


April 24, 2012 | Jeff Sauro wrote:

John,

No, you're absolutely right! Thanks for the catch, I just fixed it.
 


April 23, 2012 | John wrote:

Jeff - Please correct me if I'm wrong here, but your statement in example #1 seems to be mis-worded. "If 11 out 12 users complete a task on Design A and only 5 out of 10 can complete the same task on Design B, then we can be 97% confident more users can complete the task on Design B." Shouldn't it be, "more users can complete the task on Design A."  


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 5 + 3: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4285 Subscribers]

Connect With Us

Our Supporters

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Use Card Sorting to improve your IA

Loop11 Online Usabilty Testing

Usertesting.com

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.