Hypothesis Testing in the User Experience
Jeff Sauro • February 5, 2013
The science project.
It's something we all have completed and if you have kids might see each year at the school science fair.
- Does an expensive baseball travel farther than a cheaper one?
- Which melts an ice block quicker, salt water or tap water?
- Does changing the amount of vinegar affect the color when dying Easter eggs?
While the science project might be relegated to the halls of elementary schools or your fading childhood memory, it provides an important lesson for improving the user experience.
The science project provides us with a template for designing a better user experience. Form a clear hypothesis, identify metrics, and collect data to see if there is evidence to refute or confirm it. Hypothesis testing is at the heart of modern statistical thinking and a core part of the Lean methodology
Instead of approaching design decisions with pure instinct and arguments in conference rooms, form a testable statement, invite users, define metrics, collect data and draw a conclusion.
- Does requiring the user to double enter an email result result in more valid email addresses?
- Will labels on the top of form fields or the left of form fields reduce the time to complete the form?
- Does requiring the last four digits of your Social Security Number improve application rates over asking for a full SSN?
- Do users have more trust in the website if we include the McAfee security symbol or the Verisign symbol?
- Do more users make purchases if the checkout button is blue or red?
- Does a single long form generate higher form submissions than the division of the form on three smaller pages?
- Will users find items faster using mega menu navigation or standard drop-down navigation?
- Does the number of monthly invoices a small business sends affect which payment solution they prefer?
- Do mobile users prefer to download an app to shop for furniture or use the website?
Each of the above questions is both testable and represents real examples. It's best to have as specific a hypothesis as possible and isolate the variable of interest. Many of these hypotheses can be tested with a simple A/B test
, unmoderated usability test
, survey or some combination of them all
Even before you collect any data, there is an immediate benefit gained from forming hypotheses. It forces you and your team to think through the assumptions in your designs and business decisions. For example, many registration systems require users to enter their email address twice. If an email address is wrong, in many cases a company has no communication with a prospective customer.
Requiring two email fields would presumably reduce the number of mistyped email addresses. But just like legislation can have unintended consequences, so do rules in the user interface. Do users just copy and paste their email thus negating the double fields? If you then disable the pasting of email addresses into the field, does this lead to more form abandonment and less overall customers?
With a clear hypothesis to test, the next step involves identifying metrics that help quantify the experience
. Like most tests, you can use a simple binary metric
(yes/no, pass/fail, convert/didn't convert). For example, you could collect how many users registered using the double email vs. the single email form, how many submitted using the last four numbers of their SSN vs. the full SSN, and how many found an item with the mega menu vs. the standard menu.
Binary metrics are simple, but they usually can't fully describe the experience. This is why we routinely collect multiple metrics, both performance and attitudinal. You can measure the time it takes users to submit alternate versions of the forms, or the time it takes to find items using different menus. Rating scales and forced ranking questions are good ways of measuring preferences for downloading apps or choosing a payment solution.
With a clear research hypothesis and some appropriate metrics, the next steps involve collecting data from the right users and analyzing the data statistically to test the hypothesis. Technically we rework our research hypothesis into what's called the Null Hypothesis, then look for evidence against the Null Hypothesis, usually in the form of the p-value
. This is of course a much larger topic we cover in Quantifying the User Experience
While the process of subjecting data to statistical analysis intimidates many designers and researchers (recalling those school memories again), remember that the hardest and most important part is working with a good testable hypothesis. It takes practice to convert fuzzy business questions into testable hypotheses. Once you've got that down, the rest is mechanics that we can help with.