What is it that defines A/B tests that makes us have to give them their deserved share of importance?

The apparent simplicity of the A/B Test concept, the ease and speed of implementation and the surprising results of the most well-known success stories can make us execute them many times without stopping to think about the decisions and conditions that will define the course of the tests.

And this can lead to a failure of results and frustration to continue using this tool.

However, if we know in detail the scope and capacity of the A/B Tests, the decisions we must make during the preparatory phase prior to their execution and the relevant considerations for analyzing their results, we will increase their chances of success.

Without forgetting that negative results can also be used to draw lessons for the future.

In this article, we will review the most important concepts of the A/B Tests in a context that facilitates their comprehension and scope, and we will also look at what we must take into account in order to carry them out successfully.

Finally, we will discuss a case study on how to implement them with Google Analytics in a content manager as popular as WordPress.

Defining the problem: How to increase conversions?

Imagine that you have a Landing Page with an average of 10,000 monthly visits and a conversion rate of 1%, that is to say, 100 users complete an action that marks a goal in your project.

Without going into assessing whether these are good or bad values (which would be the subject of debate for another forum), how could you double the number of conversions?

In other words, we focus on an intermediate step in the sales funnel:

Conversions as part of the sales funnel

The first, perhaps obvious, solution we usually think of is to increase the number of visits: the more visits, the more conversions’ seems a reasonable statement.

However, this solution is based on the following two assumptions:

  1. That the conversion rate is unchanged. Since we do not change the Landing Page, we tend to think that the conversion rate will not change significantly either.
  2. That the proportion of qualified visitors, within the increase in visits we have received, will maintain a similar proportion to what we have had before.

 

The harsh reality is that we have no way of ensuring that these factors remain more or less constant.

In other words, even if we manage to double the number of visits, it is likely that the increase in the percentage of qualified visits and the impact on the conversion rate will be lower, so that in the end we will not achieve the desired effect of doubling the total number of conversions.

Even assuming that we manage to keep the conversion rate and the proportion of qualified visitors constant, in order to double the number of monthly visits we will have to develop and launch promotion and dissemination campaigns for our website.

However, implementing these campaigns has costs associated with how we do them:

  • Organic positioning campaigns (SEO), which require continuous monitoring and dedication over time and whose results may take several months to become apparent.
  • SEM (AdWords) campaigns, which require an economic investment during the duration of the campaign and although their results can already be observed from the beginning of the campaign, they only persist as long as this investment is maintained.

Duplicating visits does not guarantee double conversions and, in addition, may require a significant investment, either in time or money.

Let us now look at the other leg of the equation and ask ourselves the following question: what if, instead of increasing the number of visits, we try to increase the conversion rate?

Identifying what can increase the conversion rate

In the scenario we had imagined at the beginning, with 10,000 hits and a 1% conversion rate, we looked at the possibility of doubling the number of hits to double the total conversions, but we saw that we had some drawbacks.

We have another possibility to double the number of conversions if, instead of the number of visits, we focus on doubling the conversion rate.

As soon as you search the Internet, you’ll find hundreds of tips and tricks for designing Landing Pages that increase the conversion rate.

However, can we be sure that these techniques are really effective for our website and how much benefit they can bring us?

I am not questioning the effectiveness of these techniques, nothing further from my mind, but we must not forget that there are no universal solutions’ and that, within the general guidelines that these recommendations set out, there are also multiple factors that affect the extent to which they are effective.

We can take as a reference the market niche in which we are located or the profile of our target audience, to name a few.

As a first approximation to address our problem, we could think about modifying our Landing Page by introducing some of these techniques or recommendations and wait to see what happens.

However, this option has several drawbacks:

  • Most of the time, these recommendations are not specific in their description.

Rather, they refer to general elements whose effect on the conversion rate should be evaluated and assessed for the specific characteristics of our website.

For example, things like choosing one or the other color for the buttons, the typeface of the action call, or the images used may influence the conversions, but what color or font would work best for “my” Landing Page?

  • If we have made a mistake in the application or selection of a technique, we may lose a significant number of conversions.

And, consequently, have negative effects on our profits.

Bearing in mind that observing the effect of a change may require a high number of visits, the cost of lost profits may be too high for our objectives.

  • Results may be influenced by external or seasonal factors.

A very clear example would be if our products or services are related to leisure activities that, most likely, have a greater commercial pull at dates close to holiday periods.

If we change our Landing Page on those dates, we have no way of knowing whether the increase in conversions is due to those changes or to the proximity of the holidays.

At this crossroads, what could we do to be able, on the one hand, to “experiment” with various alternatives to improve the conversion rate but, at the same time, reduce the damage caused by inappropriate changes in the Landing Page or the effect of external influences?

Fortunately, we have a tool, easy to implement and execute, that can help us answer this question: A/B tests.

A/B Tests help to identify which changes in a web page can improve the value of a metric-target

So far we have set ourselves the goal of increasing the conversion rate as an example to better understand the concept of A/B testing, as it is a generally well understood and unambiguous metric.

However, we can also use A/B tests to analyze any other metrics, such as bounce rate or page dwell time.

In short, all the metrics that you want to optimize and that you can analyze from very specific modifications within a page.

What are A/B Tests?

A priori, we cannot know for sure what changes to a Landing Page would lead to an increase in conversions, so we need to experiment to find the best combination, but reducing the risk of losing profits or obtaining inconclusive results due to external influences.

In this sense, A/B tests allow us to compare the behavior of two versions of the same page, usually a Landing Page, that differ in a single element and whose objective is to evaluate the impact that this element has on the visiting users, measured by the metric that we define, during a certain period of time.

During the test run, visitors randomly view one or another version of the page, so that at the end of the experiment, the total number of visitors will be divided equally between the two versions.

It could be seen as if, when a user requests the page, a coin is flipped over and, as heads or tails come out, one version or the other is shown.

With this method, we solve two of the problems we identified earlier, when we considered modifying the Landing Page without any further consideration:

  1. Since the two versions of the page coexist, we eliminate the influence of any external or seasonal factors on the analysis of the results, since both are affected equally.
  2. If the change adversely affects the conversion rate, the impact on the loss of our profits is reduced by half.

The classic example in the A/B tests refers to the color of the buttons in the calls to action, using one color on the original page (“A” version or control page) and another color in the variation (“B” version) of the page:

One-page A and B versions with different colored buttons

However, we should not limit these tests to changes as simple as variations of colors, fonts or even headlines and images, but also structural changes in the layout of different elements of the page, such as the length of an article or the position of an advertising banner:

A and B versions of the page with a banner in different positions

Don’t be afraid to try any changes you can think of, but always be careful to avoid multiple or radical changes that make the two pages too different.

If we do so, we will not be able to identify which change has produced the greatest benefit or even the positive effects of one change can be neutralised by the negative effects of another and we will be discarding any change that would have been beneficial.

Whatever you change, never forget the maximum of the A/B tests: two similar versions of a page, with a single different element between them.

To give you an idea of the importance of being creative with the changes we can make, take a look at the following two announcements, which one do you think has the highest conversion rate?

Two one-page versions with a change in the main picture

The information on both pages is identical but, while on one page, the baby looked at us head-on, on the other page, he looked at the headline of the ad… the latter being the one with the highest conversion rate!

Therefore, do not limit your imagination as to what kind of changes you want to experience.

Since we already have a clear understanding of what A/B tests are and what we can do through them, let’s see what the process is like to carry out an A/B test on our website:

  • Identify which web pages we want to analyze and the metrics we will use to evaluate the impact of the changes we will make on each page.

There are no restrictions on the number of A/B tests we want to run simultaneously, as long as they are on different web pages.

Likewise, each test can be evaluated with the metric we consider appropriate, independently of the rest.

  • Identify which element of each page we want to evaluate and what change we will make.

There can only be one change per page, affecting only that single element, to ensure the correct allocation of the impact on the variation of the metrics to be analyzed.

  • Create the corresponding variations of each page, identical to the original ones in all concepts except for the elements identified in the previous point.
  • Determine the duration of the experiment, either in number of visits or for a specific period of time.

In both cases, it must be large enough to collect the data needed for further analysis of the metrics.

  • Run the test, dividing the visits randomly by 50% between both versions of each page.

And recording separately the behavior of users according to the metrics we have associated for that page.

  • Stop the experiment and analyze the metrics of both versions of each page to determine the impact and extent of the change made on the original page.

This, once we have reached the number of visits or time period established as the duration of the experiment.

  • An experiment should not be carried out indefinitely.

Google supports A/B testing, but sets out some recommendations and conditions of use.

Once you have finished the A/B test, if you see better values for the chosen metric in the page variation, congratulations, you have managed to improve your website. But don’t rest on your laurels: the process doesn’t end here, it opens the door for more A/B tests.

This time, on the new version of the page and thus continue to improve other aspects of it, for the same or other metrics.

If, on the other hand, the metrics in variation have been worse, don’t be embarrassed, because you have learned an important lesson: what doesn’t attract or please your audience.

And that can possibly inspire you to make other changes that they do like or like.

Although the results of an A/B test do not reflect improvements in the target metric, they will help us to get to know our audience better.

In this process, note that all the points depend exclusively on our decisions, except the point dedicated to distributing the visits between the control page and its variation.

Performing this task must be implemented on the web server itself and therefore requires a technical solution, to which we will devote the last section of this article.

How long should an A/B test last?

If you look at the point in the process of running an A/B test on the duration of the experiment, it said that it should be large enough to collect the necessary data’.

But how much is “big enough”?

Depending on the context, when I hear or read the expression “big enough” I usually translate it to “nobody knows” or “it depends”.

To understand it better, let’s flip a coin several times and write down the times when they come up heads and tails, I’ll do it for you.

If we flip the coin ten times, we will most likely get a 6-sided, 4-cross deal, or 4-sided, 6-cross deal.
We would also get 7 faces and 3 crosses, or the other way around, quite a few times, but rarely will we get exactly 5 faces and 5 crosses.

Now suppose we flip the coin a hundred times.

Most of the time, we wouldn’t get a 50-sided, 50-cross split either, but almost always values in the range 45-55.

The same thing would happen if we threw it a thousand times.

It is very unlikely to get 500 faces and 500 crosses but, most of the time, the number of faces and crosses would be between 475 and 525.

That is to say, in a totally random process of 50%, such as the tossing of a coin, if we repeat it many times, we will not obtain an exact distribution of 50% of faces and crosses, but a small range around this percentage and not for that reason we will think that the coin is counterfeit.

Maybe you’re starting to see where I’m going: the distribution of visits between the control page and the variation page in an A/B test is also done randomly at 50%, so that each page receives approximately the same number of visits over time.

But, as we have seen in the example of coins, randomness can be capricious and introduce small fluctuations, completely at random, into this distribution.

An A/B test performed on a small number of visits will never give reliable results in the evaluation of the target metric.

Therefore, when it is said that the duration of the A/B test should be “long enough”, it means that we must ensure that there are enough visits to counteract those distributions that, by chance, assign some qualified visits to one version of the page more than to the other.

How to analyze the results in A/B testing?

However, it is not enough to have a high number of visits to eliminate the effect of randomness and to analyze the test results, but we also have to take into account how much the metric we have chosen for the test changes.

Let’s see it with an example….

In our initial example, we had considered 10,000 visits with a conversion rate of 1%.

This means 100 conversions per month.

But let’s take another case, one page with 1,000 hits and a conversion rate of 10%; it would also mean 100 conversions per month.

Suppose we run an A/B test on both pages for one month, getting the same result: 95 conversions on the control page and 105 conversions on the variation page.

What conclusions could we draw from these results, and in both cases is the variation page better?

For the time being, it should be noted that the conversion rate is in itself an average value, which means that it will also show small fluctuations over time.

With this consideration in mind, the conversions obtained in the first example translate into conversion rates of 0.95% and 1.05% respectively, while in the second example they translate into rates of 9.5% and 10.5%.

It doesn’t seem so clear anymore that the variation page is better in both cases, does it?

Forced to choose, I would keep the control page in the first case, since the variation does not introduce any significant change in the conversion rate.

In the second case, I would keep the variation page, although I would also analyze the evolution of the conversion rate during the previous months to check to what extent the observed improvement is not part of the fluctuation of this metric.

As you can see, there is no “magic number” for the duration of an A/B test or an “infallible recipe” to analyze the results, but we must choose them carefully according to the number of visits, the change in the metric chosen during the test and the fluctuation of this metric in previous periods.

We must be aware that the “spectacular improvements” of a metric-target will only be achieved step by step, test by test.

Of course, if the metric analyzed shows a huge margin for improvement (from 10% to 50%, or so on), there’s no doubt which page to choose but, as you’ll see when you do your own tests, these results are rare.

Fortunately, for practical purposes we will not have to worry too much about the length of the A/B tests and how to analyse their results. It is important that we understand why we should not shorten them.  

Does SEO get along with A/B Tests?

Perhaps you have already wondered: if we have two versions of the same page, will it not negatively affect its organic positioning, since the visits are divided between them?

What’s more, couldn’t Google even penalize us for having two different pages with the same address, mistaking it for a cloaking attempt?

First of all: no, Google will neither penalize you nor affect your organic positioning.

In fact, Google published on its official blog (needs update by google) that it fully accepts the A/B Tests, as well as other types of tests, since the aim of these tests is to improve the user experience and make the content more attractive and useful to the user.

A/B Tests do not penalize SEO of pages, to the extent that Google encourages them to be done to improve the user experience.

In the same article, Google recommends good practices to ensure that we do not violate any of the Webmaster Guidelines or that Google may penalize us and I summarize below:

  • Do not use the test as a pretext for cloaking. We are talking about very similar pages, with very small and localized differences, whose access does not depend on who (or what, if it is the Google crawler) navigates the page.
  • If the control page and its variant use different URLs, add the attribute’rel=’canonical” in the variant and point to the control page, to inform Google that the two pages are related.
  • Do not use the noindex meta tag on the variant page, as it may confuse Google and cause it to canonize the variant to the detriment of the original page, discarding the address of the original page in the search indexes.
  • If the control page and its variant have the same URL, with the web server doing the redirect to show the variant, use 302 redirects to tell Google that it is a temporary redirect and that the relevant URL is the original one, not the redirected one, and thus prevent Google from indexing it to the detriment of the original one.
  • Establish a time limit for the test run. By definition, these tests are performed to differentiate which version of a page provides the best results and to publish it as final. They cannot therefore be implemented indefinitely. If Google detects a test that has been running “too long”, it may interpret it as an attempt to manipulate search results and penalize it. Google doesn’t say how much “too long” it is, because it depends on many factors, but the number of visits can be a reference: the more web traffic, the less need there is for the test to last longer.

From a practical point of view, we only have to worry about the last point, to avoid that the A/B test is executed indefinitely, since the rest depends on its execution being correctly implemented in the web server or the tool we use.

Advantages and limitations of A/B Tests

Throughout the explanation of the A/B tests and their execution process we have seen some of their advantages and also their limitations.

Let us now review them all together to get a better perspective of their scope and possibilities:

Advantages

  • The two versions of the page receive a similar number of visits during the same period of time, so any outside influence will affect both equally and will not contaminate the interpretation of the results.
  • The cost of loss of profit is minimized, in the event that the change in the page would damage the value of the chosen metric, as it only affects 50% of the qualified visits (those that visited the variant page).
  • By analyzing only one very specific change in the original page, the evolution of the metric analyzed will be exclusively due to that change, facilitating the choice of one version or another.
  • The duration of the experiment can be variable, adjusting to the volume of visits needed to obtain relevant results in the selected metric.
  • Cumulative tests can and should be done, one after the other. That is to say, first on an element, selecting the winning page and then continuing successively with other specific changes on the page.
  • Experience gained with an A/B test, whether positive or negative, can be used to design future pages with similar objectives.

Limitations

  • You can only see the effect of a very localized change within the whole page, so it does not provide any reference for the first design of the page or new campaigns.
  • At first, it is not known which change may be more favourable (e.g. which colour, which text or which image) and requires other types of analysis to be used together to identify it (e.g. A/B tests usually complement heat maps very well).
  • It may require a high number of visits in order to accumulate sufficient data for the analysis of the target metric to be relevant and reliable.
  • It only allows you to compare two versions of a page. Although it can be extended to incorporate more versions, the test duration should also be longer to collect enough data to analyze the effects on the metric.
  • The distribution of visits between the versions of the page remains constant and at 50% throughout the duration of the test, even when the metric starts to show a clearly favourable trend towards some of the versions.

How to do A/B Tests in WordPress?

There is a wide variety of plugins for A/B testing in WordPress, with a wide range of additional features and services.

Many providers offer a free version, with only a portion of the features available, or an unlimited but time-limited trial version.

However, if we have never done A/B tests before, the learning curve with some of these plugins can distract us from the important aspects of the whole process.

On the other hand, most also offer other types of tests, which can further complicate their understanding and configuration.

We have a better option to consolidate the concepts of the A/B tests and to know well their execution dynamics, through the Google Analytics Experiments, in the Behavior menu of the main Reports tab:

Since we are used to the Google Analytics environment and nomenclature, the learning curve is very smooth and we can focus our efforts on what really matters: understanding, running and analyzing the A/B tests.

In addition, as it is a Google tool, we make sure we are following the best practices for running A/B Tests.

Although there are many plugins for A/B Tests in WordPress, we can do them effectively with Google Analytics experiments.

1 – Create a new experiment

Assuming this is the first time we do an experiment, Analytics displays a blank list:

Then we click on the “Create experiment” button and Google shows us the four steps we must go through and fill in to configure our experiment:

Let’s see the fields to be filled in in the first step, “Select an experimental target”:

  • Name of this experiment

The name by which we want to identify and recognize this experiment in the corresponding list.

We must ensure that the name is meaningful and related to the purpose of the experiment, e.g. “Rebound rate on the registration page”.

  • Purpose of this experiment

Select the metric we have decided to analyze after the test run.

Analytics shows us a list of available metrics, but we can create our own targets.

  • Percentage of the traffic of the experiment

Be careful! Don’t confuse it with the distribution of visits at 50% of the A/B test: it has nothing to do with it.

It refers to what percentage of the total traffic will be devoted to participating in the experiment.

That is, if we set it at 60%, the A/B test will only be performed on 60% of the visits.

We usually leave it at 100%.

  • Mail notification of important changes

We enter our email address so that Analytics can notify us if an incident occurs.

Before continuing with the next step, we must click on the “Advanced options” link, as this is where we will set up an A/B test:

  • Distribute traffic equally among all variants

We must select “Yes”.

If you select No, we would no longer be doing a traditional A/B test, but an adaptive test according to the selected metric (in case you are interested, this type of test is called multi-armed bandits).

  • Advanced options

The other advanced options define how Analytics will determine when there is enough data to complete the experiment.

We can leave them at the default values.

Note that Analytics decides the duration of the experiment for us, through the confidence limit.

Using statistical analysis and the value of this parameter, Analytics determines when enough visits have been made so that the results of the experiment are sufficiently representative (both for good and for bad).

What a job it takes to get us off our backs!

2 – Set up the experiment

Click on the “Next step” button and enter the addresses of the control page (i.e. the original) and the address of the variant (i.e. the page with the change you have decided on) in the form provided:

Before entering the URL of the variant page, we must have created it in our WordPress, as Analytics verifies that it exists and loads correctly.

Note that we can add more variants of the page.

Analytics offers possibilities beyond the A/B tests, a sample of its versatility, but it can also complicate the analysis of the results and the test will take more time, as visits have to be spread over more versions of the original page.

Click on the “Next step” button.

3 – Configuration of the experiment code

This is perhaps the most delicate part of the whole Analytics configuration process to make our A/B test operational, since we must copy a JavaScript code into the original page of the experiment, inside the <head> tag of the HTML code:

You have a plugin (Google Content Experiments) that allows you to add this JavaScript code from the same screen where the WordPress pages are edited, only in the original page of the experiment:

4 – Review and begin

Once this JavaScript code has been copied, click on the “Next step” button and Analytics verifies that all the settings are correct

If everything went correctly and Analytics did not detect any errors, we can start the A/B test by pressing the “Start experiment” button.

5 – Analysis of the data of the experiment

Although the final results will not be available until Analytics considers that it has sufficient data, depending on the confidence limit that we have set, we can consult how the metric chosen for the original page and its variant varies, selecting the experiment from the list.

As it may be several weeks before Analytics shows some relevant information in the report, I’ll put the next report from one that has been running for several months so that we can interpret the information it shows:

As you can see, it is a more complex experiment than an A/B test, as it includes 3 variants of the original page, so it requires considerably more execution time.

Although it has not yet been completed, it is already seen as one of the variants is a clear loser (with 94% fewer conversions than the original).

Meanwhile, another variant has good prospects as a candidate, with 7% more conversions, although with a probability of surpassing the original still a little low (66.11%), hence Analytics has not yet completed the analysis.

In this case, given the length of time this experiment has been in progress (more than 5 months) and without any conclusive results, it would be necessary to consider whether it is worth continuing to be carried out.

One possibility, which would offer results in less time, would be to create a new experiment, but this time a real A/B test, with only the original page and the variant most likely to succeed.

Conclusions

A/B Tests offer a method to improve the metrics of a website, usually the Landing Page, that has, for example, few conversions, high bounce rate or short page dwell times, all without having to invest in SEO or SEM positioning campaigns.

With the A/B tests we can compare two versions of a website to find out which of them produces better results.

While there is no limit to how different these versions can be, changes should be implemented step by step, with successive tests identifying which ones are beneficial and discarding those that may harm us.

If you take a quick look at the article, you’ll notice that more than three-quarters of the article is about what A/B tests are and how they should be done, while only a quarter explains how to implement them in WordPress.

This division is not the result of chance: the actual implementation and execution of A/B tests is incredibly simple.

What really makes the difference is that we fully understand its dynamics, what we want to achieve, how we can achieve it and how to evaluate the results.

Never rush into running an A/B test without being sure of what you want to achieve.

Finally, a word of advice: do not expect spectacular results after a single A/B test.

Achieving such results requires doing and analyzing many A/B tests, until you find the changes that, accumulated, provide important improvements.

Examples of conversions that are multiplied by 10 due to a button color change are that: an example.

In such cases, act a little like devil’s advocate: if only the color of a button could achieve such wonderful results, why don’t all websites have high conversion rates?

With what we have seen in this article, how do you think you could have achieved better results?

And if you’ve never used them, can you think of a place where you could do A/B tests on your website?