Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

Books Faster than Tablets…or not?

p < .05 is a Convention not a Commandment

Jeff Sauro • July 27, 2010

Recently Nielsen conducted a study on the reading speeds between the printed book, Kindle and iPad. From 24 users the study concluded that the iPad took about 6.2% longer (p =.06) and Kindle about 10% longer (p <.01) to read than the same story on a printed book.

From this data Nielsen concluded "Books Faster than Tablets" and while tablets have improved dramatically over the years, they are still slower than the book (albeit modestly).  

Put another way, the data tells us we can be 94% sure a difference as large as 6.2% between the iPad and book isn't due to chance alone.

Would you think differently about the difference in reading speeds if the result said 96% instead of 94%?  John Grohol PsyD did and wrote an article using this study as an example of "bad research."  One of John's main criticisms is that Nielsen's statistics don't back up his claim that books are faster than tablets.

In statistics classes students are taught to conclude something is "statistically significant" if the p-value is less than .05. Under this criteria the Kindle is statistically significant (p =.01) and iPad is not (p=.06).  

P < .05 is a Convention not a Commandment

Establishing a difference as "statistically significant" when the p-value is less than .05 is a convention. It is a convention that's been taught for decades and used as a rejection criterion in most peer-reviewed journals.  Interestingly enough, there is no mathematical reason why we use .05 instead of say .06. There are some good reasons which include:
  • It is a nice round number
  • It accounts for around two standard errors in a normal distribution
  • It intuitively seems about right
  • It comes from a time when we relied on tables of values instead of software  
Oh and in case you wondered, there is no mathematical connection between a p of .05 and testing with five users… except that Magic Number 5.

Conventions are helpful because they remove some of the subjectivity that can "stack the data" in a way that favors the author's hypothesis. Conventions are bad when used as commandments without thought for context. Numbers are objective but interpretation always involves judgment about the context and consequences of being fooled by chance.
All three examples have the same p-value but the context certainly matters. We require more evidence when money and mortality are involved (the first two are hypothetical p-values based on actual scenarios).

Peer-reviewed Journals

Peer-reviewed journals have a special place in contributing to our scientific knowledge, and rightfully so. But they don't have a monopoly on ideas, research or inspiration.  A problem with the emphasis on p <.05 is it doesn't account for the magnitude of the difference.

With a large enough sample size almost any difference is statistically different.  It's more interesting to have a difference of 6% at a p=.06 than a difference of 1 % at p =.01 when comparing reading speeds. Only the latter would make it into a peer-review journal.

Applied Research

Being 94% confident chance can't explain the difference in reading speeds, I'm convinced that people read Ernest Hemingway a little faster on books than tablets.  The limitations of this study are more about the type of users and material tested than the p-values—a point also raised by Grohol.  Perhaps this finding doesn't apply to James Joyce, Jim Collins or Japanese readers.

In applied research you'll never be able to test enough people, explore every possibility, or address all the limitations of your data. Applied research is about making better decisions with data and with limited time and money. Fortunately life and death are rarely consequences of making wrong decisions in applied research.

By all means, conduct your research, summarize it on the web and report your p-values. Tell us what conclusion you drew from the data and see if your readers are convinced.  If there is a compelling story it will be replicated or refuted in another web-article. If your p-value is less than .05 it might even make it in a peer-review journal ... although fewer people will read it.

For the interested reader see : Statistics as Principled Argument and Beyond ANOVA: Basics of Applied Statistics.


About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More


UX Bootcamp: Aug 20th-22nd in Denver, CO
Best Practices for Remote Usability Testing
The Science of Great Site Navigation: Online Card Sorting + Tree Testing Live Webinar


You Might Also Be Interested In:

Related Topics

Statistics
.

Posted Comments

There are 2 Comments

July 27, 2010 | Jeff Sauro wrote:

Thanks for the comment and reference and great point about Meta-Analysis. Actually I thought Grohol was a bit too tough on Nielsen in this case. It's hard not to agree that there are limitations in this and any applied study--but that's how they work. I thought his criticism about the p-value was the weakest for the reasons I outlined in the post. 


July 27, 2010 | Dave Mulder wrote:

While I usually appreciate Nielsen's work, you and Grohol are spot-on for catching him on this 'study'. Unfortunately, folks are going to run and preach it as gospel. A much more carefully controlled series of research is needed here.

The best published critique I've seen on NHST is "A Critical Assessment of Null Hypothesis Significance Testing in Quantitative Communication Research" written by Levine, Weber, Hullett, Park, and Lindsey in Human Communication Reports. Definitely worth checking out.

Perhaps the most interesting outcome of p<=.05 being a convention is its influence on meta-analysis. Meta-analysis only covers studies that are published, and studies only get published when they show statistical significance. To get that significance, you either need to have big effect sizes or a large sample; so meta-analysis (depending on the field) tends to report an exaggerated average effect size. 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 4 + 3: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[3789 Subscribers]

Connect With Us

UX Bootcamp

Denver CO, Aug 20-22nd 2014

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Our Supporters

Use Card Sorting to improve your IA

Usertesting.com

Loop11 Online Usabilty Testing

Userzoom: Unmoderated Usability Testing, Tools and Analysis

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.