A/B testing involves gauging the effectiveness of two or more versions of an experience to determine which one yields the better outcome
A/B testing is one of the single best tools in your toolbox to increase your conversion rates and revenue
Test types include split URL testing, multivariate testing, and multipage testing
There are many tools available to help teams roll out A/B tests on websites, apps, ads, and in emails
Effective A/B test hypotheses are always grounded in data, highlighting the importance of product analytics
What is A/B testing?
A/B testing is the process of testing two versions of a digital element to identify which performs best. It’s also referred to as split testing.
Though A/B testing is mostly used for marketing purposes, it originated with Ronald Fisher, a 20th-century biologist and statistician widely credited with developing the principles and practices that make this method reliable.
As it applies to marketing, A/B testing can be traced to the 1960s and 1970s, when it was used to compare different approaches to direct response campaigns like mailers.
Now, A/B testing is used to evaluate all sorts of marketing and product initiatives, rom emails and landing pages to websites and apps. While the targets of A/B testing have changed, the principles behind it have not.
Below, we discuss how A/B testing works, different strategies behind the experimentation process, and why it’s critical to your success.
How does A/B testing work?
A/B testing is creating an experiment that shows the differences in responses to two versions of a website, app, or landing page to gauge user response. Statistical data is collected and analyzed to determine which version performs better.
The most basic version of A/B testing compares two versions of a webpage: the control (variation A) and a second that contains the change that is being tested (variation B).
A/B testing is sometimes referred to as random controlled testing (RCT) because randomization helps to produce more reliable results by ensuring sample groups are assigned randomly.
If you are testing elements that are dependent on other interactions, you may need to engage in a different, more complex testing.
Why should you consider A/B testing?
When there are problems in the conversion funnel, A/B testing can be used to help pinpoint the cause. Some of the more common conversion funnel leaks include:
Confusing calls-to-action buttons
Poorly qualified leads
Complicated page layouts
Too much friction is leading to form abandonment on high value pages
Checkout bugs or frustration
A/B testing can be used to test various landing pages and other elements to determine where issues are being encountered.
Solve visitor pain points
When visitors come to your website or click on a landing page, they have a purpose, like:
Learning more about a deal or special offer
Exploring products or services
Making a purchase
Reading or watching content about a particular subject
Even “browsing” counts as a purpose. As they do this, they might encounter roadblocks that make it difficult for them to complete their goals. A visitor might be confused by copy that doesn’t reflect the messaging in the pay-per-click ad that sent them to the landing page in the first place. The CTA button might be difficult to find or is unresponsive.
Every time a user encounters an issue that makes it difficult for them to complete their goals, they might become frustrated. This frustration degrades the user experience, lowering conversion rates.
There are several tools you can use to understand this visitor behavior. FullStory, for example, uses heatmaps, funnel analysis, session replay, and other tools, to help teams perfect their digital experiences. By analyzing the data received from these tools, you can identify the source of user pain points and start working towards fixing them.
Regardless of which tool you use, be sure to combine both quantitative and qualitative data to both identify issues and understand why they’re happening.
How Thomas used FullStory and a robust A/B testing program to boost conversions 94%
"FullStory helps us keep users top-of-mind, but also gives our co-workers a systematic way to propose evidence-based ideas for improvement."
Get better ROI from existing traffic
If you're already getting a healthy amount of incoming traffic, A/B testing can help you boost the ROI from that traffic.
In addition to helping businesses uncover holes in their conversion funnels, the information provided from A/B testing can also help businesses maximize existing traffic ROI.
A/B testing helps to identify which changes have a positive impact on UX and improve conversions. This approach is often more cost-effective than investing in earning new traffic.
Reduce bounce rate
Bounce rate is a metric that calculates how often this occurs over time and is an important metric that relates to the performance of your website.
Typically, a bounce is counted when someone visits a page on your website and, then leaves without navigating to any other pages. There are also other ways to define a bounce rate, but they all imply the same thing: disengaged users.
Essentially, a bounce may indicate that people are entering your website, encountering something confusing or frustrating, then leaving right away.
This user friction is a perfect example of when to A/B test: you can identify the specific pages where visitors are bouncing and change elements you believe are causing problems. Then, use A/B testing to track the performance of different versions until you see a real improvement in performance. Finding why users are leaving your pages is also why the best experimentation teams integrate their testing tools with product analytics platforms—that way you can see precisely what's bothering your users.
These tests can help identify visitor pain points and aid in your efforts to improve UX overall.
Make low-risk modifications
There's always a risk in making major changes to your website.
You could invest thousands of dollars in redesigning the components for a campaign that isn't doing well or overhaul your website in response to poor overall conversions . However, if those changes don't pay off, the you won’t see a return on that investment in time and resources.
A/B testing can be used to make small changes rather than implementing a total redesign.
That way, if a test fails, you have risked much less time and money.
Achieve statistically significant results
It’s important to recognize that A/B testing only works well if the sampling is statistically significant. It doesn’t work if testers rely on assumptions or guesses in setting up the tests or analyzing the results.
Statistical significance is used to determine how meaningful and reliable an A/B test is. The higher the statistical significance, the more reliable a result is.
Statistical significance is "the claim that a set of observed data are not the result of chance but can instead be attributed to a specific cause."
If a test is not statistically significant, there could be an anomaly, such as a sampling error. And if the results are not statistically significant, they shouldn’t be considered meaningful.
The ideal A/B test results should be 95% statistically significant (though some testing managers use 90% so the necessary sample size is smaller and results are found faster).
There can be challenges to reaching a sufficient level of statistical significance:
Not enough time to run tests
Pages with exceptionally low traffic
Changes are too insignificant to generate results
It may be possible to improve statistical significance by running tests on a more active portion of your website or making more significant changes. There may also be cases where lack of traffic makes it nearly impossible to get results that are significant enough.
Understanding sample sizes and statistical significance also helps you plan how long your tests will take.
Redesign websites to increase future business gains
If you do engage in a full website redesign, A/B testing can still be helpful.
This approach is best taken when you believe that your current website is optimized for performance and that the only way to achieve significantly better results is through a redesign.
Like any other A/B testing, you will make two versions of the site available. Then, you will measure the results after you have received a statistically significant number of visitors.
A/B testing should not end once you’ve gone live. Instead, this is the time to begin refining elements within your site and testing those.
Important ideas to know while testing
There are specific strategies to consider when testing: Multipage testing, split URL, dynamic allocation, and multivariate testing.
Split URL testing
Split URL tests are used for making significant changes to a webpage in situations when you don't want to make changes to your existing URL.
In a split URL test, your testing tool will send some of your visitors to one URL (variation A) and others to a different URL (variation B). At its core, it is a temporary redirect.
When should you consider split URL testing?
In general, use this if you are making changes that don’t impact the user interface. For example, if you are optimizing page load time or making other behind-the-scenes modifications.
Larger changes to a page, especially to the top of a page, can also sometimes “flicker” when they load, creating a jarring UX. Split URLs are an easy way to avoid this.
Split URL testing is also a preferred way to test workflow changes. If your web pages display dynamic content, you can test changes with split URL testing.
Multivariate testing is a more complex form of testing, and an entirely different test type from A/B.
It refers to tests that involve changes to multiple variations of page elements that are implemented and tested at the same time. This approach allows testers to collect data on which combination of changes performs best.
Multivariate testing eliminates the need to run multiple A/B tests on the same web page when the goals of each change are similar to one another. Multivariate testing can also save time and resources by providing useful conclusions in a shorter period.
For example, instead of running a simple A/B test on a page, let's say you want to run try out a whole new multi-page experience. You want step 1 to be either variation A or B, and you want step 2 to be either C or D.
When you run a multivariate test, you'll be running many combinations of these variations:
A then C
A then D
B then C
B then D
A multivariate test is, essentially, an easier way to run multiple A/B tests at once.
Because there are more variations in multivariate tests than A/B tests multivariate tests require more traffic to achieve statistical significance and take longer to yield reliable results.
A multipage test involves changes to specific elements. Instead of testing the change you make on one page, you apply that change to multiple pages, such as every page within a particular workflow.
In that case, you would use the sales funnel to gauge the results. Then, you test the new pages against the control. This approach is known as "funnel multipage testing."
The second is to add or remove repeating items such as customer testimonials or trust indicators. Then, test how those changes affect conversions. This approach is known as "conventional" or "classic multipage testing."
Dynamic allocation is a method of narrowing down variations in testing by eliminating the ones that are going to be low-performing. This method helps to streamline the testing process and save time. It's also known as a multi-armed bandit test.
Let’s say you’re an online retailer and are holding a flash sale. You know you want as many people as possible to view your sale items when they arrive on your site. To do that, you want to show them a CTA color that gets as many people to click on it as possible — blue, pink, or white.
Using a dynamic allocation test, your testing tool will automatically detect which variation drives the most clicks and automatically show that variation to more users. This way you drive as many clicks as possible as fast as possible.
Because traffic is not split equally, dynamic allocation doesn't yield statistical significance and doesn’t yield any learnings you can use in the future.
Dynamic allocation is for quick conversion lifts, not learning.
How do you choose which type of test to run?
There are several factors to consider when deciding which tests to run for conversion rate optimization (CRO) testing. You should think of:
The number of changes you’ll be making
How many pages are involved
The amount of traffic required to get a statistically significant result
Whether you are making a high-level or low-level UI design decision
Finally, consider the extent of the problem you are trying to solve. For example, a micro-conversion on a single landing page that can be impacted by changing a button color would be a perfect use case for A/B testing.
However, changing multiple pages that a user encounters across their customer journey would be a better fit for multipage testing.
How do you perform an A/B test?
An A/B test is a method of testing changes to determine which changes have the desired impact and which do not.
While organizations once only occasionally turned to A/B testing, now many teams across 51% percent of top sites A/B test to improve customer experience and boost conversions.
Step 1: Research
Before any tests can be conducted or changes made, it's important to set a performance baseline. Collect both quantitative and qualitative data to learn how the website in question is performing in its current state.
The following elements represent quantitative data:
Average items added to the cart
Much of this information can be collected through product analysis or DXI tools like FullStory.
Qualitative data includes information collected on the user experience through polls and surveys. Especially when used in conjunction with more quantitative data, it is valuable in gaining a better understanding of site performance.
Step 2: Observe and formulate a hypothesis
At this stage, you analyze the data you have and write down the observations that you make. This approach is the best way to develop a hypothesis that will eventually lead to more conversions. In essence, A/B testing is hypothesis testing.
Step 3: Create variations
A variation is simply a new version of the current page that contains any changes you want to subject to testing. This alteration could be a change to copy, headline, CTA button, etc.
Step 4: Run the test
You’ll select a testing method here according to what you are trying to accomplish, as well as practical factors such as the amount of traffic you can expect. The length of the test will also depend on the level of statistical accuracy you are trying to achieve.
Step 5: Analyze results and deploy the changes
At this point, you can go over the results of the tests and draw some conclusions. You may determine that the test was indeed conclusive and that one version outperformed the other. In that case, you simply deploy the desired change.
But that doesn't always happen. You may need to add and test an additional change to gain additional insights. Additionally, you might decide to move on to testing changes in another part of the workflow. With A/B testing, you can work through all of the pages with a customer journey to improve UX and boost conversions.
There are some best practices to use while A/B testing, but understand that all sites, apps, and experiences are different. The true “best practices” for you can only be truly understood through testing.
Server-side vs. client-side A/B testing
This adjustment is based on which variation they are supposed to see according to the targeting you have set up in the A/B test. This form of testing is used for visible changes such as fonts, formats, color schemes, and copy.
Server-side testing is a bit more robust. It allows for the testing of additional elements. For example, you would use this form of testing to determine whether speeding up page load time increases engagement. You would also use server-side testing to measure the response to workflow changes.
How do you interpret the results of an A/B test?
The results of an A/B test are measured based on the rate of conversions that are achieved. The definition of conversion can vary. It might include a click, video view, purchase, or download.
This step is also where that 95% statistical significance comes into play. After multiple test runs, 95% will indicate the true rate of conversion. However, you also have to consider a margin of error.
So if that margin is ±3%, that can be interpreted as follows: If you achieve a conversion rate of 15% on your test, you can say that conversions are between 12% and 18% with 95% confidence.
There are two ways to interpret A/B testing results.
The first is the frequentist approach. This approach is based on the assumption that there are no differences between A and B.
Once testing ends, you will have a p-value or probability value. This value is the probability that there is no difference. So a low p-value means that there is a high likelihood of differences.
The frequentist approach is fast and popular, and there are many resources available for using this method. The downside is that it's impossible to get any meaningful results until the tests are fully completed. Also, this approach doesn't tell you how much a variation won by — just that it did.
The Bayesian approach involves the use of existing data in the experiment. These are known as priors, with the first prior being "none" in the first round of tests. In addition to this, there are also evidences, which is the data from the current experience.
Finally, there is the posterior. This figure is the information that is produced as the result of the Bayesian analysis of the prior and the evidences.
The key benefit of the Bayesian approach is that you can look at the data during the test cycle. Then, you may call the results if it is clear that there is a clear winner. Additionally, you are able to identify a winning variation.
How do companies use A/B testing?
A/B testing can be used by brands for many different purposes. As long as there is some sort of measurable user behavior, it's possible to test that.
The A/B test method is commonly used to test changes to website design, landing pages, content titles, marketing campaign emails, paid ads, and online offers. Generally, this testing is done without the test subjects knowing they are visiting Test Version A of a web page as opposed to Version B.
Stage 1: Measure
In the planning stage, the idea is to identify ways to increase revenue by increasing conversions. Stage one includes analyzing website data and visitor behavior metrics.
Once this information has been gathered, you can use it to plan changes and create a list of website pages or other elements to be changed. After this is done, you may create a hypothesis for each element to be changed.
Stage 2: Prioritize
Set priorities for each hypothesis depending on the level of competence, importance, and ease of implementation.
There are frameworks available to help with the process of setting these priorities, for example, the ICE, PIE, or LIFT models.
ICE is importance/confidence/ease.
Importance — How important is the page in question
Confidence — The level of confidence the test will succeed
Ease — How easy is it to develop the test
Each item is scored, and an average is taken to rate its priority.
PIE is potential/impact/ease.
Potential — The business potential of the page in question
Impact — The impact of the winning test
Ease — How easily the test can be executed
The variables here are slightly different from ICE but are scored in the same way.
PIE and ICE are both easy to use, but the downside is that they are subjective. People will apply their own biases and assumptions when scoring these variables.
LIFT Model is the third framework for analyzing customer experiences and developing hypotheses. This framework is based on six factors:
Value proposition — The value of conversions
Clarity — How clearly the value proposition and CTA are stated
Relevance — Relevance of the page to the visitor
Distraction — Elements that distract visitors from the CTA
Urgency — Items on the page that encourage visitors to act quickly
Anxiety — Anything creating hesitance or lack of credibility
Stage 3: Test
After prioritizing, determine which ideas will be tested and implemented first by reviewing the backlog of ideas. You should decide according to business urgency, resources, and value.
Once an idea is selected, the next step is to create a variation. Then, go through the steps of testing it.
Stage 4: Repeat
It's risky to test too many changes at the same time. Instead, test more frequently to improve accuracy and scale your efforts.
The top A/B testing tools to use
There are several tools available to help businesses set up, execute, and track A/B tests. They all vary both in price and capability.
The best web and app A/B testing tools
Optimizely is a platform for conversion optimization through A/B testing. Teams can use the tool to set up tests of website changes to be experienced by actual users. These users are routed to different variations; then, data is collected on their behavior. Optimizely can also be used for multipage and other forms of testing.
AB Tasty offers A/B and multivariate testing. Testers can set up client-side, server-side, or full-stack testing. Additionally, there are tools like Bayesian Statistics Engine to track results.
Both Optimizely and AB Tasty seamlessly integrate with FullStory so you can see how users seeing different experience behave.
VWO is the third big player in A/B testing and experimentation software. Like Optimizely and AB Tasty, they offer web, mobile app, server-side, and client-side experimentation, as well as personalized experiences.
Email A/B testing tools
There are several specialized tools for testing changes made to marketing campaign emails. Here are the most widely used:
Moosend is a tool for creating and managing email campaigns. One of the features it offers is the ability to create an A/B split test campaign. This ability allows marketers to test different variations of marketing emails, measure user response, and select the version that works best.
Aweber provides split testing of up to three emails. Users can test elements such as subject line, preview text, and message content. Additionally, it allows for other test conditions such as send times. Testing audiences can be segmented if desired, and completely different emails can be tested against one another.
MailChimp users can create A/B testing campaigns using variables such as subject line, sender name, content, and send time. There can be multiple variations of each variable.
Then, the software allows users to determine how the recipients will be split among each variation. Finally, testers can select the conversion action and amount of time that indicates which variation wins the test. For example, the open rate over a period of eight hours.
Constant Contact offers subject line A/B testing. This feature allows users to validate which version of an email subject line is most effective. It is an automated process where the tool automatically sends emails with the winning subject line to recipients once the winner is determined.
A/B testing and CRO services and agencies
Some companies have the infrastructure and personnel in place to run their own experimentation program, but other companies might not. Fortunately, there are services and agencies available to help drive your A/B testing and CRO efforts.
Based in the UK, Conversion is one of the world's largest CRO agencies and work with brands like Microsoft, Facebook, and Google.
Also based in the UK, Lean Convert is one of the leading experimentation and CRO agencies.
Prismfly is an agency that specializes in ecommerce CRO, UX/UI design, and Shopify Plus development.