Research shows that organizations that include testing and learning in their product roadmaps—rather than just shipping features and moving on—are more likely to consistently deliver better digital experiences than those that don’t.
A thoughtful digital experimentation strategy that’s built on data can deliver business-impacting results. Often, there are a few key challenges organization’s encounter when building that strategy, including:
Choosing where to start the experimentation process
Accounting for inconclusive test results and eliminating wasteful experimentation
Knowing how long to let tests run
Avoiding bias in the experimentation strategy
In a recent panel, digital experience experts from Google, Optimizely, and FullStory offered tips for tackling these obstacles and building a robust, data-led experimentation strategy that boosts your bottom line.
Here’s a look at some of the best practices they shared.
Start with the most critical outcome
For organizations just getting started with experimentation, where to start is the question. The panelists had two key recommendations for answering this question:
Start with the most important path on your website. For a B2B organization, the most critical user path might be getting someone to complete a demo request form. For an ecommerce business, it might be a customer arriving on the checkout page. By focusing your first experiments on your organization’s most important KPIs, you can drive change quickly and open the door for more experimentation ideas to spring forth.
Have a hypothesis. At its core, analytics is science. Before running an experiment, clearly outline the problem you’re trying to solve and the data you’ll need to do it. Defining a hypothesis gives you something to prove or disprove with your experiment. (The good news is that if you’re using a DXI solution like FullStory that takes an autocapture approach to data collection, the data you need to prove or disprove the hypothesis is likely already at your fingertips.)
Pre-test planning helps avoid inconclusive or wasteful testing
Today, CROs everywhere are tightening budgets—which means you want to ensure you’re getting the most from your experimentation strategy and tools. According to the panelists, implementing pre-test planning makes your tests more likely to be successful.
Forming a hypothesis, as mentioned in the previous section, is part of pre-test planning. To take that recommendation one step further, that hypothesis should be rooted in data. Let’s take a simple landing page test as an example here. Through your Digital Experience Intelligence solution, you know the baseline conversion rate, scroll depth, and time on page. You use those metrics to hypothesize that moving the CTA above the fold will increase conversions by a certain percentage. Your experiment will either prove or disprove this hypothesis—so even if the test “fails,” it still has a conclusive outcome.
Test earlier in the product roadmap cycle
You don’t need to wait until after a feature is launched to begin testing it. The earlier you can introduce testing, the more confidence you’ll have that you’re making the right product decisions.
For example, if you’re considering building a feature but don’t know if it’s the right thing to prioritize, you might use the Painted Door Test. Let’s say you’ve received a request to add a button to a landing page. To validate this request, you can add an element that looks like a button, and when a user clicks it, they get a popup saying, “We’re actively building this feature, check back soon!” If 3% of visitors click on it, it’s probably not worth actually building it. But if 30% of visitors click it, you can safely assume it’s a widely desired feature.
Experimentation air traffic control
If your organization has been running experiments for a while, it’s possible that there are so many in flight that they could collide at some point in the user journey—which could render results null. Mapping out where experiments might bump into one another is another critical part of pre-test planning.
Test lengths will vary depending on the experiment
When it comes how long an experiment should run, there are plenty of differing opinions on what the best practices are. A general rule of thumb, according to the panelists, is: Tests for large, significant changes should take less time to generate statistically significant results. Conversely, smaller, less noticeable tests often take longer to gather enough data to form a conclusion.
Understanding when an experiment’s results have reached statistical significance is a complex matter—but your testing tool can help. Where many experimentation platforms use a traditional model called t-tests to determine statistical significance, Optimizely has developed a model specifically geared toward digital experimentation—the Optimizely Stats Engine. Rather than getting a PhD in statistics, Optimizely users can rely on the Stats Engine to provide easily digestible insights into user testing statistics.
There are several ways bias can creep into your experimentation
Whether conscious or unconscious, humans have biases—but in order for your test data to be objective, you have to take pains to ensure bias doesn’t contaminate your experimentation strategy.
Running experiments with uneven splits is a common example of biased testing. Let’s say you want to run a test but keep the impact small. If you set up an A/B test where 90% of traffic gets experience A and only 10% of traffic gets experience B, the results will be biased—and therefore, invalid. However, you could eliminate the bias in this test by running the test with an even 50/50 split but scaling it down to only 10% of visitors.
Another common way bias is introduced into testing is if the experiment ideas are all coming from the same person or small group of people. For example, Optimizely partnered with Harvard Business School to survey 1,000 companies to find out at what level testing decisions were being made.
They found that the higher the title of the decision maker, the smaller the overall wins were from testing. This is likely because people who are higher up in an organization have more to lose if a test is unsuccessful, so they’re making smaller bets. To ensure that this type of bias is kept out of the testing strategy, experimentation ideas should be coming from many different people at different levels.