Measuring Usability
Quantitative Usability, Statistics & Six Sigma by Jeff Sauro

10 Things to Know About Unmoderated Usability Testing

Jeff Sauro • November 13, 2012

One of the biggest barriers to conducting usability testing is the cost and time involved in testing.

Moderators have to bring users to a dedicated location, test each one (usually one at a time), and only then get results from only a handful of users.

Unmoderated usability testing is a technique using software such as from Userzoom or Loop11 to administer tasks and questions without the need of a facilitator. 

Here are 10 things to know about this essential usability testing method that's reducing the cost and improving the frequency of usability testing.

  1. It's growing: According to the latest User Experience Professionals Associate Survey in 2011, around 23% of respondents report using unmoderated testing (compared to 52% using lab-based testing). This has shown growth of 28% since 2009 when 18% of respondents used it. The method wasn't even listed as an option in 2007!

  2. Recruiting is a lot easier: Jakob Nielsen calls it "unglamorous" and Steve Krug says he's not very fond of it in "Rocket Surgery Made Easy." Finding qualified participants is hard but necessary. Fortunately, for unmoderated tests, it's a bit easier to find both more users and more specialized users through a variety of approaches.

    You can use panel companies like OP4G and Toluna, which are able to recruit and send users to your study or you can pull users right off of a website. When we use intercepts to recruit off of websites we typically see a much higher attrition rate than when we use panel companies. In general, we like to include both types in our unmoderated studies as they help provide a good mix of data from current users to prospective users.

  3. Survey + Usability Study: We often start a project and have vague business questions to work with: Do customers understand our unique selling points? What do we change in our checkout form? Is our new homepage design better? We operationalize these questions into testable hypotheses, and use a mix of tasks and traditional survey questions. This allows us to examine both attitudes and action. Sometimes, it's the percentage of users who click on a navigation element. Other times, it's the answer to a question about how few users understood a concept that becomes most insightful.

  4. Metrics Fiesta: In unmoderated studies it's easy and usually fairly automatic to collect User Experience metrics like completion rates, task time, task-difficulty, overall perceptions of usability, the Net Promoter Score, and task-level confidence. Products like Userzoom will even collect every click and click path to generate compelling heat maps and paths to help understand where users are going.

  5. With video it's almost like the lab: In almost every unmoderated study we have a subset of users who we video thinking aloud while completing the tasks and going through the same study as the larger sample of users. We use Usertesting.com, which has a large panel of participants from the U.S., Canada and the UK. We are also able to recruit on more specific criteria like having a department store credit card, having researched or purchased a laptop online, or having purchased an item at Target.com in the last six months. We'll see a low task completion rate and wonder what's causing it. It usually takes just a couple videos and we can see why users are struggling. Sometimes it's the complexity of the task, terminology problems, navigation issues or even a poorly placed pop-up.

  6. Setup takes about half as long as lab studies: Setting up the study, and carefully designing and pre-testing tasks and questions takes about as long as moderated testing. For example, in the Comparative Usability Evaluation 9, the average time spent on unmoderated sessions was 37 hours versus 60 hours for moderated testing (a little more than half the time). There's sort of a fixed cost associated with any usability test—metrics, tasks, user profiles and research questions. The real benefit comes from the time invested per user.

  7. It's more efficient than lab-based studies: The logistics involved in having people come to a physical lab in one or even a few locations isn't trivial. It usually takes weeks to recruit and facilitate, and it takes at least one person's full time during the study (hard to multitask in a lab!). While the average time spent on unmoderated sessions was 37 hours versus 60 hours for moderated studies, for teams that tested more users, the payoff was significant. For example, in the Comparative Usability Evaluation 9, Teams G and L both had similar tasks, data collection and methods. Team L tested 12 users in a lab and Team G tested 314 unmoderated users. The study found that it took over three hours per participant for the lab-based study but only 3.5 minutes for the unmoderated study. In just over half of the overall testing time, the unmoderated test collected similar data on 26 times more users (see the table below)!

     Team     
     Total Hours
     Users  Hours/User
    G (Lab Based)
     40  12  3.33
    L (Unmoderated)
    21
    314
    .06
    Table 1: Hours spent by teams G and L on testing and (type of testing) from CUE-9.

  8. Mostly Comparable to Lab Data: While there isn't a lot of data comparing the results that comes from the different methods, we found that measures of overall ease (using the System Usability Scale), task completion, and task-level difficulty were reasonably similar. Task time, however, was found to differ by a substantial 30%. This begs the question: Which task time is the "correct" time? The one in the artificial lab environment with people watching behind a one-way-mirror more accurate, or the one where users are on their own computer and might get "interrupted" by Facebook, Twitter or the toilet?  In short, they both are probably wrong but, when making comparisons in task time, sticking with the same method ensures a fair comparison. Synchronous (and face-to-face) interactions of course do allow you to follow up and engage in a dialogue with users so unmoderated testing will never be a full replacement for moderated testing.

  9. You need a way to verify task completion: In a typical moderated usability study, the facilitator can determine whether a user has successfully completed the task. Because no one is watching the user in an unmoderated study, you need some way to determine success. This is done by using a validating question or a validating URL.

    • Validation by Question: If users are asked to look for a specific product, you can ask for the price, model number or some other piece of information that can only be found if the task was completed successfully. For example, if the task is to look for the fair market value of a 2010 Honda Accord in a specific zip code, you can provide a few plausible values at the end of the task and have users select the correct one. We always include an "other" option because, despite detailed planning, there always seems to be exceptions or product variations we never counted on. The other responses allow us to go back and give credit.

    • Validation by URL: If there is a specific page on a website that a user can only get to if they've located the correct item or piece of information, you can use the software to check the final URL(s). For example, in a recent test of findability, we knew there were only three pages that contained the correct piece of information so we were able to verify whether users completed the task.

  10. Statistical Precision: It is a common misconception that you need a large sample size to use statistics. However, with smaller sample sizes, you are limited to seeing only large differences between designs or generate confidence intervals that are rather wide.  With a larger sample size you are able to detect smaller differences in designs. This can be especially important when you're designing a new homepage or improving your navigation, and differences of 5%-15% translate into a meaningful difference. 

    Because it's easier to recruit and faster to test more users with unmoderated testing, you are able to detect smaller differences and have more precise metrics. For example, the table below shows the typical margin of error around your metrics for a sample size of 20 will be approximately 18%, compared to approximately 6% for a sample size of 200. To compare, say completion rates between two designs, the difference would have to be at least 60 percentage points if you tested 20 users (10 in each group). For a sample size of 200 (100 in each group), differences as small as 17 percentage points would be statistically significant. That is, if one group had a completion rate of 50% versus 67% at a sample size of 200, a difference this large or larger would be not explainable by chance alone.

    Sample Size Typical Margin of Error
    (90% Confidence)
     Smallest Difference to Detect
    (90% Confidence & 80% Power)
    20 +/- 18% 60 percentage points (e.g. 20% vs. 80%)
     200  +/- 6%
    17 percentage points (e.g. 50% vs. 67%)
    Table 2: Typical margin of error for two sample sizes (20 and 200) and smallest difference to detect when comparing designs.



About Jeff Sauro

Jeff Sauro is the founding principal of Measuring Usability LLC, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 4 books on statistics and the user-experience.
More about Jeff...


Learn More


UX Bootcamp: Aug 20th-22nd in Denver, CO
Best Practices for Remote Usability Testing
The Science of Great Site Navigation: Online Card Sorting + Tree Testing Live Webinar


You Might Also Be Interested In:

Related Topics

Usability Testing
.

Posted Comments

There are 3 Comments

March 19, 2013 | Anna Kitowska wrote:

Hi Jeff! Very interesting entry, I agree with your points, especially with the one about the importance of mixing survey questions with other tests. In order to facilitate this process we've created a toolbox for UX experts. You can find it on http://usabilitytools.com/ and check our UX utility belt. Cheers! 


November 25, 2012 | Liat wrote:

Excellent blog. I am a UX expert. Up till now I have been using Morea for usability testing which gave me both "dry" data such as # of clicks and duration of tasks and both recordings. To better the testing cover I would like to try unmoderated usability testing and I am lookling for a tool that will give me similar results. So far I came up with tools that wither give me the "dry" data or the recording but not both. Can u recommend a tool that covers it all?

Thanks a lot 


November 21, 2012 | mens cyber monday north face cheap sale denali jackets light blue wrote:

Hi! Would you mind if I share your blog with my myspace group? There's a lot of folks that I think would really enjoy your content. Please let me know. Cheers 


Post a Comment

Comment:


Your Name:


Your Email Address:


.

To prevent comment spam, please answer the following :
What is 3 + 4: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[3790 Subscribers]

Connect With Us

UX Bootcamp

Denver CO, Aug 20-22nd 2014

3 Days of Hands-On Training on User Experience Methods, Metrics and Analysis.Learn More

Our Supporters

Usertesting.com

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Use Card Sorting to improve your IA

Loop11 Online Usabilty Testing

.

Jeff's Books

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download

.
.
.