Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Are both positive and negative items necessary in questionnaires?

Jeff Sauro • April 26, 2011

There is a long tradition of including items in questionnaires that are phrased both positively and negatively.
  • This website was easy to use.
  • It was difficult to find what I needed on this website.
The major reason for alternating item wording is to minimize extreme response bias and acquiescent bias.

However, some recent research[pdf] Jim Lewis and I conducted found little evidence for these biases. We found response bias effects are at best small and outweighed by the real effects of miscoding and misinterpreting by users.

Usability Questionnaires mostly Alternate

The popular System Usability Scale(SUS) has items that alternate between positive and negative wording. In fact, of the most frequently used questionnaires to measure attitudes about usability all but one use a mix of positive and negative items.
  • System Usability Scale (SUS): 10 Items (half positive & half negative)
  • Post-Study System Usability Questionnaire (PSSUQ [pdf]): 19 Positive items
  • Software Usability Measurement Inventory (SUMI): 50 Items with a mix of positive and negative
  • Questionnaire for User Interaction Satisfaction (QUIS): 27 items with a mix of positive and negative

Advantages to Alternating

There are two major reasons for alternating item wording
  1. Reducing Acquiescent Bias: This is what happens when users generally go on auto-pilot and agree to all statements. In a 5-point scale these would be all 4's and 5's.

  2. Reducing Extreme Response Bias: Participants who provide all high or all low ratings (all 5's or all 1's on a 5 point scale). This is somewhat related to the acquiescent bias except respondents basically pick the most extreme rating and provide it to many or all items.
By including a mix of both positive and negative items, respondents are forced to consider the question and (hopefully) provide a more meaningful response which should reduce these biases.

Despite published concerns about acquiescence bias, there is little evidence that the "common-wisdom" of including both positive and negatively worded items solves the problem. To our knowledge there is no research documenting the magnitude of acquiescence bias in general, or whether it specifically affects the measurement of attitudes toward usability. 

Disadvantages to Alternating

There is a dark side to alternating items. We are aware of at least three.
  1. Misinterpret:  Users may respond differently to negatively worded items such that reversing responses from negative to positive doesn't account for the difference.  There is evidence that this lowers the internal reliability, distorts the factor-structure and is more problematic in cross-cultural settings.

  2. Mistake: Users might not intend to respond differently, but may forget to reverse their score, accidentally agreeing with a negative statement when they meant to disagree.  We have been with participants who have acknowledged either forgetting to reverse their score or commenting that they had to correct some scores because they forgot to adjust their score.

  3. Miscode: Researchers might forget to reverse the scales when scoring, and would consequently report incorrect data.  Despite there being software to easily record user input, researchers still have to remember to reverse the scales. Forgetting to reverse the scales is not an obvious error. The improperly scaled scores are still acceptable values, especially when the system being tested is of moderate usability (in which case many responses will be neutral or close to neutral).

    While this may seem likely an easily avoidable problem, we found 3 of 27 SUS datasets (11%) to be miscoded suggesting the harried life of a researcher, marketer or product manager can affect between 3 and 28% of all datasets (which represents the 95% confidence interval)

Is it worth the trouble?

Does alternating item wording outweigh the real negatives of misinterpreting, mistaking and miscoding?  To find out, we created an all positively worded version of the SUS and tested it against the original alternating SUS in a series of remote unmoderated usability studies.

#All Positive SUS Originial SUS
1
 I think that I would like to use the website frequently. I think that I would like to use this system frequently.
2
 I found the website to be simple. I found the system unnecessarily complex.
3
 I thought the website was easy to use. I thought the system was easy to use.
4
 I think that I could use the website without the support of a technical person. I think that I would need the support of a technical person to be able to use this system.
5
 I found the various functions in the website were well integrated. I found the various functions in this system were well integrated.
6
 I thought there was a lot of consistency in the website. I thought there was too much inconsistency in this system.
7
 I would imagine that most people would learn to use the website very quickly. I would imagine that most people would learn to use this system very quickly.
8
 I found the website very intuitive. I found the system very cumbersome to use.
9
 I felt very confident using the website. I felt very confident using the system.
10
 I could use the website without having to learn anything new. I needed to learn a lot of things before I could get going with this system.

We had 213 users in the US attempt two representative tasks on one of seven websites (third party automotive or primary financial services websites: Cars.com, Autotrader.com, Edmunds.com, KBB.com, Vanguard.com, Fidelity.com and TDAmeritrade.com). 

The tasks included finding the best price for a new car, estimating the trade-in value of a used-car and finding information about mutual funds and minimum required investments.  At the end of the study users randomly completed either the standard or the positively worded SUS. There were between 15 and 17 users for each website and questionnaire type. The mix of gender, age and education levels were not statistically different between groups.

Results

We found little evidence that the purported advantages of the alternating items outweighed the disadvantages.
  • Differences in scores were negligible : The mean SUS scores, means to the even items and the means to the odd items were statistically indistinguishable (see Figures 1 and 2 below).
 
Figure 1: Mean SUS scores for both versions (p >.39).
 
Figure 2: Mean scores (scaled from 0 to 4) for odd (p >. 54) and even (p >.2)items.
 
  • No difference in acquiescent bias : The mean number of agreement responses on both questionnaires were nearly identical  1.64 for the standard and 1.66 for the all positive (p > .95).

  • No difference in extreme response bias:  The mean number of extreme responses was 1.68 for the standard SUS and 1.36 for the positive version (SD = 2.23, n = 106), a nonsignificant difference (t (210) = 1.03, p > .30).

  • No difference in reliability: The internal reliability of both questionnaires was high (Cronbach's alpha of .92  for the original and .96 for the positive).

Take Aways

Negatives Outweigh the Positives: There is little evidence that the purported advantages of including negative and positive items in usability questionnaires outweigh the disadvantages. This finding certainly applies to the SUS when evaluating websites using remote-unmoderated tests. It also likely applies to usability questionnaires with similar designs in unmoderated testing of any application. Future research with a similar experimental setup should be conducted using a moderated setting to confirm whether these findings also apply to tests when users are more closely monitored.

New Usability Questionnaires Shouldn't Alternate Wording: Researchers interested in designing new questionnaires for use in usability evaluations should avoid the inclusion of negative items.

No Reason to stop using the original SUS (just watch your coding!) Researchers who use the standard SUS have no need to change to the all positive version provided that they verify the proper coding of scores (for example by using the error-checking spreadsheet included in the SUSPackage).
  • In moderated testing, researchers should include procedural steps to ensure error-free completion of the SUS (such as when debriefing the user).

  • In unmoderated testing, it is more difficult to correct the mistakes respondents make, although it is reassuring that despite these inevitable errors, the effect is unlikely to have a major impact on overall SUS scores.

All Positive SUS generates similar results: Researchers who do not have a current investment in the standard SUS can use the all positive version with confidence because respondents are less likely to make mistakes when responding, researchers are less likely to make errors in coding, and the scores will be similar to the standard SUS.

Is there ever a good reason to alternate items?

We only examined questionnaires that measure usability or system satisfaction. Usability is an analysis at the group level (we're not testing users, but rather applications that users use) so we care about differences between groups. It could be that in other areas of behavioral research where the emphasis is on the individual (e.g. clinical or counseling psychology) alternating item wording provides benefits that outweigh the problems. Until other research identifies net benefits to alternating item wording, it's best to stay positive.

For more detail on the experiments and related research into this topic see the full paper[pdf] (to be presented at CHI in May 2011).



About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More


UX Bootcamp: Aug 20th-22nd in Denver, CO
Best Practices for Remote Usability Testing
The Science of Great Site Navigation: Online Card Sorting + Tree Testing Live Webinar


You Might Also Be Interested In:

Related Topics

SUS
.

Posted Comments

There are 3 Comments

April 3, 2012 | M. wrote:

Hello. I really like this article and would like to refer to it in my paper. Could you please help me with the citation of this article in your website using APA style? thanks 


June 21, 2011 | Nettie wrote:

I thank you humbly for sharing your wisodm JJWY 


May 18, 2011 | Elizabeth Buie wrote:

This is a great article, Jeff, and you and Jim did a great job of presenting this work at CHI 2011. I'll add here the fourth "M" that I have identified and that I described to you at CHI "manual". When you use mixed positive/negative wording, you have to do the work yourself to reverse the scores on the negative ones so that you can do averages, etc. But if you use all positive wording, you can let online survey tools do that for you. I've already used your results in a post-usability-test survey, and found it a great relief not to have to download the spreadsheet and write the formula to convert half the scores. (Not that it's a complicated formula, of course, but the more we can take advantage of automated tools, the better!) 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 1 + 4: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[4108 Subscribers]

Connect With Us

UX Bootcamp

Denver CO, Aug 20-22nd 2014

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Our Supporters

Use Card Sorting to improve your IA

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Usertesting.com

Loop11 Online Usabilty Testing

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.