What Goes into a Usability Test Plan?
Jeff Sauro • October 1, 2013
Failing to plan is planning to fail.
It's both good practice and often necessary to have a test plan before beginning a usability test.
Like any plan, it should not only lay out the framework of the study, but also help identify problems with the methodology, metrics or tasks while something can still be done to fix things.
Test plans, like many documents, can take on a life of their own and become bloated and unnecessarily complex.
Many organizations are "document factories," and the test plan, reports and processes documents can generate as much focus or more focus than the results themselves. This is, of course, anathema to Lean UX thinking
Nevertheless, it's a good idea to spell out a few sections in each test plan. The folks at Usability.gov have a comprehensive template
(which you'll need to trim down to suit your needs) and Chapter 5 of the Handbook of Usability Testing
has a detailed discussion about building a test plan.
While each plan will vary based on the organization and test situation, here are the sections we use most often.
Objectives and Business Questions
This can be one or several sections. The aim is to clearly let everyone know, from the facilitator to the project sponsor and the stakeholders, what is being tested and why a usability test is being conducted in the first place.
What You're Testing
State the name (external and/or internal names), version of the interface you are testing, the functional area if necessary (e.g. Payables Version V9 in the Financials Module).
• Which of three possible checkout options leads to higher comprehension of in-store pickup?
• Are users able to rent a car in less than 30 seconds?
• Does the proposed navigation structure generate higher findability scores than the existing one?
• What functions in the online portal are users not understanding?
: In addition to your target demographics (age, gender, geography), be sure to identify how much experience users should have with the website or product. Product and domain experience
generally have much more impact on usability metrics than demographics in a usability test
: Spell out your target sample sizes for the entire study and by participant profile if necessary. While there is a lot of controversy around sample sizes in usability studies, the best way to determine your sample sizes is to balance the math (what the equations say
) and politics (what the stakeholders will expect).
MethodsIdentify the approach you'll use to collect the data.
Will it be a moderated remote, moderated in-person, mobile or a mix of in-person and unmoderated studies. Many usability studies we conduct use a mix-method approach (usually 5-10 moderated) and between 50 and 300 unmoderated users. This approach gets a rich set of qualitative insights and an opportunity to probe users on their interactions from the moderated portion, and the unmoderated portion allows us to understand the prevalence of issues and identify ones we missed with the smaller sample size.Stand-Alone vs. Comparative
: If the study is a comparison study you'll need to determine if it should be between-subjects (different users in each group) or within-subjects (same users in each group). The latter approach will allow you to detect smaller differences with the same sample size but you'll need to ensure tasks and products are presented in counterbalanced order to minimized carry-over effects.
For each task scenario you'll want to define at a minimum what information you'll give the user (the scenario) and the starting URL or screen. Ideally, you'll have a goal or objective for each task (e.g. can users calculate the price of their rental car?) and these sub-goals can tie back to the overall broader study goals identified in the Objectives and Business Questions section.
You should differentiate between metrics collected after each task scenario and at the end of the session.
Here are some of the more common metrics we collect with every usability test and some things to consider ahead of time.
Task Completion: Determine an objective success criteria for each task. Task Time
: For moderated studies, you'll often need to know when to start and stop the time
. For unmoderated studies, the task-time data is collected automatically, but you should identify how you'll handle times from failed tasks (usually we report only the task-completion rate) but total task duration can be an interesting measure of engagement.
Task Difficulty: Some of the most salient times during a usability test occur immediately after each task. You'll therefore want to include at least one question on the perception of task difficulty. We use the we use a single ease question
Confidence: Users tend to be overconfident
when completing tasks, meaning they think they completed the task successfully more often than they actually did (men more so than women
). When a lot of users are extremely confident and fail a task, we call that a disaster
and use it as another measure to diagnose problems.
You'll usually want to include at least a few overall test measures both before and after the task scenarios.
We usually collect standard demographic data prior to users attempting any tasks. This is also an opportunity to ask attitudinal and brand questions before users know what the study is about. You can then compare these to post-study questions to see if there is an increase or decrease in attitudes after exposure and use.
In addition to the test-metrics like SUS
, we also like to include a few open-ended questions about what the users would improve
and comments or issues they had (especially important in unmoderated studies), as well as any post-experience brand or attitudinal metrics.
Test Report Outline (Bonus)
If possible, try to outline how the report will be structured in the test plan. One productivity advantage of test plans is that if you and the stakeholders agree on the structure of the findings, you can reduce the waste in redoing reports. Every organization has their preferences on order and emphasis. We usually use a structure something like the following:
- Study Overview
- Findings Summary
- Participant Profiles & Demographic Summaries
- Task Metrics
- Usability Issues by Task and Recommendations
- Test Metric Summary
- Appendix & Verbatims
| Activity ||Dates|
| Project Kickoff Meeting || 8/15 |
| Refine Tasks and Metrics || 8/16 – 8/22 |
|Conduct Pilot Test & Report on Recommended Changes || 8/23 – 8/25 |
|Conduct Study || 8/31 - 9/13 |
| Top Line Results & Initial Readout || 9/16 |
| Revise Report with additional analyses || 9/30 |
| Final Report || 10/4 |