Many of us know the bewildered feeling that often washes over us while we watch a user test our product for the first time. We’ve lived and breathed our product and know every reason for every design choice that made it to production. Why, then, are some people struggling to complete a simple set of tasks?
It’s the curse of knowledge as applied to product design — naturally, you understand everything you make, simply because you made it. As such, you can’t be an accurate judge of its usability.
To escape the curse and actually improve user experience, many product teams decide to invest in usability testing. They put their product in front of real live humans who have agreed to try it and then study their behavior in order to answer questions like:
How easy is it for a new user to complete their first task in our product?
Does our onboarding help users understand our product quickly?
Is it obvious how to do x or y?
Alas, the problem is that getting accurate and unbiased feedback on any of these questions from user testers can be really hard.
Whether you fork out the funds for a professional user testing service (or just ask your friends to use your app while you watch), getting reliable data about how well your product performs is a challenge. The good news is that it’s not just you — this is a problem that’s rooted in human psychology.
The Hawthorne Effect.
In 1924, the sociologist Elton Mayo and his team were commissioned to analyze worker behavior at a Western Electric factory in Hawthorne, Illinois.
They tried manipulating various conditions on the factory floor to make the workers more efficient. They’d alter things such as the degree and intensity of light, the cleanliness and neatness of workstations, or the organization of workers on the factory floor.
Reliably, no matter what they changed — and no matter how insignificant that change was — there was always a subsequent, short-term boost to productivity. They could adjust the lights a barely-perceptible degree, but as long as the workers were aware that a change was being made, they seemed to work harder. Thus was born the Hawthorne effect: “a type of reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed.”
Any time you attempt to study someone who knows they are being studied, you have to contend with a whole host of personal biases and behavior modifications that aren’t indicative of how your unobserved users truly behave. Think about how you feel when someone stops by your desk and watches over your shoulder as you try to do something on your computer. You may accomplish your task just fine, but it’s probably not quite the way you’d behave if no one was watching you.
In a usability test environment, bias can be introduced in a variety of ways. Study participants might stumble over steps in a task simply because they’re nervous, when in reality your actual users fly through the same workflow with no problems at. Some may tend to hold back criticisms out of a sense of politeness, while others try to surmise the purpose of the test and alter their behavior to fit it. Either way, at the end you’re no closer to truly understanding how your users as a whole use your product.
We need to observe people using our products in order to learn about how they use them, but every time we do, the results end up skewed from true user behavior. So how do we solve this seemingly inevitable problem?
We might start with a look at two methods use by researchers to correct for these types of biases: single- and double-blind testing.
How blinded experiments reduce bias.
In a blinded experiment, the idea is to withhold information that would otherwise introduce the possibility of bias. In one of the earliest documented blinded experiments, a committee of Parisian musicians wanted to know whether a new type of violin sounded better than the revered Stradivarius. They sat together in one room and listened as an expert violinist in another room played a Stradivarius, then the competing violin, over a series of solo passages. The committee couldn’t see which type of violin was being played, so their bias about the quality of the Stradivarius was controlled for.
In a blinded experiment, the subject doesn’t have all of the information about what’s being observed and what the researcher is trying to get out of the experiment. They might even be actively misled about the true focus of the experiment.
How can you use this approach to improve your own usability testing? One option is to tell participants that you’re studying something you’re not. For example, if you want to study the effectiveness of your onboarding flow, you might actually tell testers that you’re primarily concerned with site performance as they start using the product for the first time.
Another option is to try to remove the bias that comes with feeling watched. Many usability testing tools make it possible to essentially “duck out” of the room while a subject completes a series of tasks you’ve given them, letting a recording tool capture their experience. Then you can simply review the video of their test with them after they’re done. They can speak to how they felt as they went through the tasks, but their actual actions might be less corrupted because they didn’t feel watched.
The gold standard of testing: double-blind experiments.
In a double-blind experiment, the aim is to remove potential bias from both the subjects and the experiment conductors themselves. This method is considered to achieve more accurate results than a single-blind trial because it removes any possibility for the researchers to subconsciously (or consciously) influence the test subjects in a particular direction.
The double-blind method is ubiquitous in the world of clinical drug trials, where neither the subjects nor the researchers know who is getting the drug and who is getting the placebo. It’s clear why the test subjects need to be blinded, but in the case of the researchers (who likely are doctors; people who have devoted their lives to helping others), they might behave differently or give different care advice to subjects based on whether they were taking the real drug or not.
In UX testing, it’s not exactly possible for the researchers (ourselves) to be blinded. We know what’s being tested because we decided what needed testing: the onboarding flow, or the checkout, or a comparison of the new marketing site to the old one, and so on. But there is a way to completely remove any possibility of tainting the research with our own personal biases. In fact, there’s a way to remove testing from the equation altogether.
With a session recording tool like FullStory, capturing your “test subject’s” experience using your product is completely ambient. The user is in their own environment, using their devices, coming to your site or app organically through their own natural channels, and exploring only to the extent that they want to. So, while you as the researcher can select which sessions to watch based on which flows or actions you’re most interested in, the subject is completely unbiased in how they’ve experienced your product.
Session replay + user interviews = knowledge.
While amassing a playable, searchable database of user behavior is really helpful (and very bionic if using the right tools), there are many cases where user interviews can add much needed color to your UX understanding.
The approach that we (and many of our customers) take is to use FullStory to gather real data about how users interact with our product, and use customer interviews to ask open-ended questions about our customer’s goals and frustrations. Similarly, when we speak with user testers, we use FullStory to illuminate the experiences they describe, and to see whether other customers are experiencing the same things.
Clearly, there is a place in any product team’s UX research strategy for user interviews or other methods of directly studying users in a controlled environment. But in the quest to truly remove any bias from our data, our best option is to remove the researcher, and the experiment.