The perils of A/B testing

There’s an expression in advertising that goes “I know that 80% of my advertising isn’t working. I just don’t know which 80%”. The same logic applies to all forms of design, including web design. If only we knew which part of our page content, layouts and workflows were not working as well as they should, wouldn’t that be amazing?

It would seem like a godsend to know what works when it comes to user experience design, to have confirmed in harsh quantifiable data which of two layouts, elements, or routes is the optimum and this is the promise of A/B testing. It is a powerful tool, but it is not a panacea and over-reliance on it can not only blunt your judgment as a designer, but also paradoxically result in sub-optimal solutions.

In this article I’ll take a look at some of the pitfalls of using A/B testing, and how such comparative testing can be used as part of a designers toolkit, rather than a dominant design methodology.

A/B testing has become a powerful application in the field of web design. The advent of dynamic page serving and of modern analytics software such as Google Analytics makes it easy to set-up and run A/B tests, or split tests. Visitors are served alternately one page layout or another, and the software measures which generates the greater number of a predetermined action, e.g. clicking a buy now button or completing a registration form. These actions are defined as goals: measurable, quantifiable, knowable. In web design A/B testing, these goals have to be something that can be recorded by the analytics software, so while the goal may be for a user to click on a link to an article, it cannot record whether the user reads that article.

This article has more information on how to run A/B tests, and here is a rundown of some of the best known testing case studies.

A/B testing is inevitably reductive, darwinistically evolving the ‘fittest’ design. Testing two radically different designs will tell you which one works better for the goal you are testing. You could repeat this step ad infinitum. But to get any further than this you will then need to vary two elements of the fittest design, in order to try and improve the feedback score. Almost immediately you have moved from testing 2 highly divergent designs, to tweaking the ‘winning’ design. Statisticians call this finding the local maximum rather than the global maximum. You can easily find yourself heading down an aesthetic cul-de-sac, finding the nicest looking house on the street rather than the best house in the whole town. Testing multiple options, called multivariate testing or bucket testing, adds additional complexity, and the tools are often more expensive.

Even with multiple options, split testing can only be used to measure and optimize one goal at a time. Optimizing for one goal might be fine if your site is very narrow-focused, such as an e-commerce site, where one desired outcome trumps all others. But if you have multiple aims for your site, you will need to make sure any changes test well against all goals.

Having spent so long testing and optimizing a site to find that local maximum, it’s understandable that a designer does not want to waste all that effort and pursue another design. To put it bluntly, you may have spent a long time determining which of two layouts is the best, without realizing that both pages suck. The nagging doubt must always remain, if you’ve managed to optimize the content and UX from one that scored a 6% success rate to an 8% success rate, is there another design that would net a 9% return or higher?

Users’ responses will also change over time, and what might have tested great last month may no longer be getting the best results. A danger is that you can become locked into a continuous testing and tweaking cycle. At this point you are less a designer than a quant-a automaton. You have abdicated your judgment and design sensibility to continually seek the reassurance of the test. I know people who have become obsessed with trying to test everything, decidophobic, forever seeking the Shangri-La of optimal conversion rates.

First impressions count

“You never get a second chance to make a first impression”, as the adage goes. As research at Ontario University and elsewhere has shown, visitors to a web site make a subconscious decision to like it or not in an incredibly short time, even milliseconds. The ‘halo effect’ of this initial impression colors the user’s subsequent judgement of the site and even determines their assessment of the web site’s credibility. It has always astonished me the bounce rate that all web sites get, that is people who visit a web site and almost immediately leave again. Often this is due to user frustration waiting for the page to load. Technical optimization and reducing page weight will often be more beneficial than UX testing. Slow page rendering will drive users away from even the best-looking web site.

Which brings us to an important caveat: you can only A/B test once you’ve launched. You need to have real users with real goals to accurately split test your site. Even A/B testing a private pre-launch beta site is unreliable unless you have a large beta community. A large sample size, (i.e. a high number of page visits) is also required for accurate results. Thus you will need to commit to launching with a design before you can even start thinking about optimizing. You have to make a commitment to a design, and there is always a first step into the unknown that A/B testing cannot help with.

The spark of inspiration

As Henry Ford said, “If I’d asked people what they wanted, they would have asked for faster horses”. Users aren’t always the best people to ask for feedback. This leads me to my biggest criticism of A/B testing: it forces you to follow your audience, not lead them. You abdicate responsibility for deciding what makes your web site work best to the wisdom of the crowd. You end up designing to please the audience you have, not the audience you want.

This approach leaves no place for that spark of inspiration, to create something truly original, something we’ve not seen before. It’s no wonder that so many web sites look so similar, each playing it safe with an established look. Do you dare to be different? As this provoking talk states, sometimes we need to look beyond the marginal gains and look for the quantum leap, the next big idea.

A unique design and user experience will probably test poorly at first, but it can take time to gain traction. Slowly a buzz may develop around the design, and it may attract a new audience, one that is more willing to engage with the site, its content and design in synthesis. A/B testing can be used to tweak and optimize the design and layout further, but it cannot lead you to the promised land. You will need to define the goals of what makes for an engaged audience. Page views are a very poor metric of engagement. Time spent on a page is better, or the number of comments an article attracts. But only feedback and qualitative analysis of your audience will tell you if they enjoy using the website, quantitative measurements alone will not tell you the full story.

Trust your judgment

The greatest act of design is to make a mark, know why you made it, and trust that it is good. If every element placed, every word written, is done with doubt, how can one build with confidence? Designing with confidence, and our individual design sensibility, is what allows us to design with style and personality.

Ultimately, a site that is built with the logic and consistency of a clear design vision, will always trump a site that has been built with every element timidly placed and nervously tested.

This is not to say that A/B testing does not have its place. But it is best suited to niche-testing elements, not layouts. It is less useful testing one page against another, but better for testing one element, like differing copy on a button. Workflows are also ripe for split testing: is the sign-up form better as a sequence of small steps, or one big form? What if the sign-up form is a modal window overlaid onto the home page? Check out Which Test Won to see some great examples and case studies of UX testing, predominantly in the e-commerce field.

Generally speaking you will be better off using the time spent A/B testing to modify your site in other ways that you know are improving your site, such as ensuring it renders properly across all browsers, and reducing the page weight. Is the layout responsive to different devices, offering the best possible experience? Are there typos? Does it look good on mobile devices?

You shouldn’t always need to A/B test to know that you are making your web site better.

How much A/B testing do you do? Does a good web designer need A/B testing at all? Let us know your thoughts in the comments.

Featured image/thumbnail, decision image via Shutterstock.

Martin Gittins

Martin Gittins

Martin is an interactive designer based in North Yorkshire. He still spends way too much time thinking about Constructivism, linear cities, and cycling. Find out more at www.kosmograd.com.

Join to our thriving community of like-minded creatives!