4 Experiment Types for User Research
Jeff Sauro • October 15, 2013
Which design will improve the user experience?
One of the primary goals of conducting user research is to establish some causal relationship between a design and a behavior.
Typically, we want to see if a design element or changes to an interface lead to a more usable experience (experiment) or if more desirable outcomes are associated with some aspect in our designs (correlation).
Even though we aren't wearing white lab coats or mixing chemicals, we are doing experimental research.
That is, we are examining an independent variable (designs) and looking at the effects on a dependent variable--higher completion rates, faster task times, reduced calls to support or more conversions, for example.
While this type of research falls under the broad umbrella of experimentation, there are some nuances in different research design. Four major design types with relevance to user research are experimental, quasi-experimental, correlational and single subject. These research designs proceed from a level of high validity and generalizability to ones with lower validity and generalizability. First, a note on validity.
Validity refers to how good of a job what we are measuring does at will predicting or explaining what will happen outside our controlled environment. It's often subdivided into :
Internal validity: our confidence in the causal relationship between our design and outcome variable
External validity: how well we can extrapolate our findings to real user settings with other users and situations
Randomly assigning participants to different design treatments and/or a control in a research study is an experimental design. For example, we recently wanted to know which of three designs on an ecommerce checkout page users would understand the most when deciding how to ship or pick up their products in a store. We created three shipping scenarios to test the three designs (two independent variables) and used the dependent variables of accuracy, perceived difficulty, confidence and time.
The hallmark of experimental research is randomly assigning participants to different treatments
. In this example we identified the design that users correctly selected and were most confident in making their selection.
One concern I often hear when running user research studies is that we aren't randomly sampling participants
. And this is true. Rarely are we able to truly sample at random customers or prospective customers and have them take a study.
However, this is the same problem faced by medical, educational and psychological researchers who conduct tests on humans. At the very least, we need to have people volunteer and consent to participate. In short, we usually have convenience samples
This means there are all sorts of variables from our sample we aren't controlling for or are unaware of that could be impacting our results. But by randomly assigning participants to different designs or treatment conditions, we usually spread those nuisance variables evenly across our designs. This increases the internal validity and generalizability of the findings.
As another example, researchers in Europe recently conducted an experiment where they manipulated both the usability and visual appeal of an online ecommerce website
. They essentially took one website, made the navigation intuitive or not intuitive, and then changed the colors and contrast to be appealing or unattractive.
They used task-based measures of usability (completion rates, clicks, a three-item variant of the SEQ
) and several post-test measures of usability, including the System Usability Scale (SUS)
Experiments (with random assignment) provide the strongest controls against extraneous variables and provide the highest levels of internal validity. These generate the strongest types of research results. But what happens if you cannot randomly assign participants?
If there are different conditions you want to test, but you cannot randomly assign users to the different conditions, then the study is quasi-experimental. For example, we'll often want to know if users find the beta version of a software product more usable than an existing version. Users of beta software usually volunteer to use the software during the beta-test period. This self-selection (non-random) assignment introduces a potential source of bias into the results. It has higher external validity because these groups are naturally segmented but has lower internal reliability.
When you compare attitudes of usability (say from the SUS
) from the beta software users to the existing version users and find a difference, it could be the difference is due to differences in the type of people using the software and not actual differences in the attitude. This type of problem is called confounding and makes the quasi-experimental design type less internally valid than the experimental condition.
As another example, I worked with a national retailer a few years ago and we wanted to know the effects of direct mail coupons for in-store purchases. We used two markets: one received one new coupon (treatment) and the other the standard coupon (control) in newspaper inserts mailed to homes. We compared the sales of stores prior to the coupon and after the coupon in both markets.
We couldn't randomly assign people to live in different cities, so we used two similarly sized Midwestern advertising markets and looked to see what the new coupon did to sales.
The weakness with quasi-experimental studies is that we can't be as sure as we can with random assignment that any increase in sales is attributable to the coupon or just differences between the markets.
A correlational study, as the name suggests, is when we look at the relationship between two variables and report the correlation. For example, we've often examined the relationship between product usability and likelihood to recommend
—it's a strong positive correlation (meaning ease is strongly associated, and likely predicts, much of why users do and don't recommend products).
Two famous examples of correlational studies come from the area of problem frequency and severity
. Robert Virzi in 1992 found that more frequently occurring problems tended to be the more severe ones (he found a correlation of r =.46). In attempting to replicate Virzi's findings, Jim Lewis didn't find a correlation[pdf]
between severity and frequency (a correlation not significantly different than 0).
While both studies provided valuable results, they didn't have random assignment and the independent variables weren't manipulated--which lessens the internal validity of the findings.
Single Subject Study
It's often the case that getting access to participants is extremely difficult. For example, we might be interested in whether a new interface to a PET scanner reduces the time it takes attending radiologists to adjust a setting on the scanner.
If we had access to one of these users, we could ask them to perform a task on the existing software version three times, record how long it took to complete, then have them attempt the same task three times on the new software, and finally have them attempt it again three times on the old version. A graph of what this might look like is shown below.
This type of single subject study uses what's called an ABA condition (where A is the existing software and B is the new software). The repeated trials help establish stability in our measures and increases the internal validity of our finding (as much as you can from a single subject).
The obvious limitation with the single subject design is generalizability. We have provided evidence by manipulating an independent variable (the software) that task time goes down for one user, but there could be a number of variables we aren't accounting for. For this reason, single subject designs aren't used very often in user research. However, case studies are more popular at UX conferences and applying this single subject approach where possible can bring more rigor to the conclusions.
You can actually use more than one participant in a single-subject design (for example, 2 or 3 radiologists) and use the same technique to establish the pattern. To be more sophisticated in your analysis you can also use Time Series Analysis
to examine trends over time and by condition for each user or the data in aggregate.