Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

How to interpret survey responses: 5 techniques

Jeff Sauro • May 10, 2011

Closed ended rating scale data is easy to summarize and hard to interpret.

Ideally you can compare the responses to an industry benchmark, a competitor or even a similar survey question from a prior survey. In most cases this data doesn't exist, it's too expensive or too difficult to obtain.

This leaves product managers and researchers to do their best in interpreting the raw responses.

For example, a recent survey I worked on asked a question about what users thought of the visual appeal of the software. Users were given a five point rating scale (from strongly disagree to strongly agree).



Here are the responses from 18 users:

5, 5, 5, 5, 4, 5, 3, 4, 5, 5, 5, 5, 4, 5, 1, 2, 3, 4

Because the question was just written for the survey, there's no historical or comparative data.

To find more meaning in this jumble of numbers, the first thing you need to do is compute the mean and standard deviation. While you won't necessarily report them, you'll need them for some of the subsequent steps.

There were 18 responses and the mean was a 4.167 and the standard deviation a 1.21. Here are five ways of making the raw responses more interpretable.

  1. Percent Agree (78%): An old marketing trick is to summarize the percent of respondents who agreed to the item. There were 14 of the 18 respondents who chose a 4 or 5 (the Agree's).

  2. Top-Box (56%) or Top Two box (78%) scoring: For 5-point scales the top box is strongly agree, which generates a score of 56%. The top-two box score is the same as the agree score.

  3. Net Top Box (50%): Count the number of respondents that select the top choice (strongly agree) and subtract the number that select the bottom choice (strongly Disagree choice). The popular Net Promoter Score uses a variation on this one (it subtracts the bottom six from the top 2 boxes). A Forrester annual report called the Customer Experience Index subtracts the top 2 bottom responses from the top-2 top responses (called the CxPi).

  4. Z-Score to Percentile Rank (56%): This is a Six-Sigma technique. It converts the raw score into a normal score—because rating scale means often follow a normal or close to normal distribution. We just need a reasonable benchmark to compare the mean to. I've found that 80% of the number of points in a scale is a good place to start (a meta-analysis by Nielsen & Levy also found this). For a 5 point scale use a 4 (5*.80=4), for a 7 use 5.6 and for 11 use 8.8. Next follow these three steps.

    1. Subtract the benchmark from the mean: 4.167-4 = .167

    2. Divide the difference by the standard deviation: .167/1.21 = .1388. This is called a z-score (or normal score) and tells us how many standard deviations a score of 4.167 falls above or below the benchmark.

    3. Convert the Z-score to a percentile rank: Using the properties of the normal curve we find out what percent of area falls below the .1388 standard deviations above the mean using a calculator or lookup table, we get .556 or 56%.

  5. Coefficient of Variation (29%): The standard deviation is the most common way to express variability but it's hard to interpret—especially when you use a mix of scales points (e.g. 5 and 7). The CV makes interpreting a bit easier by dividing the standard deviation by the mean (1.21/4.167 = .29). Higher values indicate higher variability. I've seen responses with similar means but with noticeably different coefficient of variations indicating respondents have inconsistent attitudes. The CV is a measure of variability, unlike the first four which are measures of the central tendency, so it can be used in addition to the other approaches.
As you can see, many of the methods generate reassuringly similar results. Here's another example using 15 responses to a 7 point scale on perceived ease of use:

7, 5, 2, 3, 6, 1, 5, 7, 7, 6, 6, 6, 7, 7, 6

This generates a mean of 5.4 and a standard deviation of 1.92

I've summarized the results in the table below along with the results of the five point scale.


5-Point Example 7-Point Example
Percent Agree
78%
80%
Top-2-Box
78% 67%
Top-Box 56% 33%
Net Top Box
50% 27%
Z-Score to %
56% 46%
CV 36% 29%

Which is the best approach?

The "best" approach depends on the context and your situation. I've used all these at some point but I prefer the z-score approach for three reasons.
  • It's the only metric that includes variability in the score.
  • It offers the most precision because it uses the mean.
  • It tends to generate results in the middle of the others.

However, there are times when executive comprehension is more important than statistical precision. If you find it hard to explain the z-score approach and are unsure whether others will be comfortable with it, one of the other approaches will generate similar results (albeit less precisely).

The metrics are even more meaningful with confidence intervals, but that's a topic for another blog. To help you get started, you can download an Excel file with the appropriate calculations for 5 and 7 point scales.


About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More



You Might Also Be Interested In:

Related Topics

Survey, Rating Scale
.

Posted Comments

There are 15 Comments

April 22, 2014 | Kushi wrote:

Great job Sir, I am thankful for these posts...I too would like to know how to interpret z score....rn 


January 1, 2014 | Hasan Ali Khan wrote:

thank you for your help. I've done a 10 point scale survey asking patients in hospitals in different departments about what kind of services do they want. When I want to analyse department only results, where n<30 , do i still use the z score? 


December 2, 2013 | billy wrote:

my study is to measure the awreness and aceptabilirnty and acceptability of VMGO using this scaledd rating .awareness 4- very aware (3.25-4.0) ,3- moderately aware (2.50-3.24) ,2 aware (1.75-2.49) . 1- not aware (1.0-1.75) likewise the accceptability has same rating.rnrnrnhow to use this scaled rating to come up with that awreness and acceptability score. try to give example plss. 


June 14, 2013 | anonymous wrote:

Got the help I needed. Good Job. Keep it up. 


March 15, 2013 | Keith wrote:

Woudl like to talk more about these approcahes offline if possible.

Please contact me if possible 


January 22, 2013 | Dave Woodbury wrote:

So, how do confidence intervals apply to ordinal data? If my calculated confidence interval is +/-20%, then what does that mean if the mean result is 4.5? (Or any mean value for that matter) 


October 23, 2012 | Caleb wrote:

Hi Jeff! This is terrific, and I am putting it to use in a heuristic-comparison exercise I am conducting. I'm employing a 5-point scale and I am getting clean results EXCEPT when variation is very low, e.g. 3,3,3,3,2,3,3,3,3,3,3. That seems to skew the score toward the (very) low end (I get a z-score to % of 0.0%, using your downloaded example spreadsheet calculator and one I built). Is that to be expected? Is there any way to control for that?
Thanks so much, I've learned a ton hanging out on your site. 


September 7, 2012 | Katie wrote:

Once again, awesome post! I'd love to learn more about confidence intervals for Likert scale data and how to report it. Thanks! 


August 18, 2012 | christy wrote:

I really like how you broke this into two ways of coming to a score point. Thanks for your input. 


August 16, 2012 | peter wrote:

well done 


May 2, 2012 | john wrote:

Many thanks. About to attempt interpretation of my firsty scholastic survey. This gives me a bit more confidence to tackle it! Will credit you in my thesis! 


May 14, 2011 | Jeff Sauro wrote:

Mateo,

Good question. You would interpret the percentage that comes from the z-score by saying the average response is in the 56th percentile. In other words, the satisfaction with the visual appeal is just above average. If you were able to rate hundreds of software products, you would expect this product to have a higher visual appeal than 56% of the products. 


May 12, 2011 | jana wrote:

Great article. Would love to see a follow up about when to use which technique. 


May 11, 2011 | Mateo wrote:

Great post, thanks!
But how can the Z-score be interpreted in this example? The software will probabely appeal visually attractive for 56% of all users? 


May 11, 2011 | Tomás Ibáñez wrote:

congratulations, is a very comprehensive approach to an important question and not always well addressed in the professional practice. 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 4 + 1: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4285 Subscribers]

Connect With Us

Our Supporters

Usertesting.com

Loop11 Online Usabilty Testing

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Use Card Sorting to improve your IA

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.