# Out of the cesspool and into the sewer: A/B testing trap

Your A/B tests are trapped in a cesspool when they should be in the sewer.﻿

Do you really care why A/B testing is analogous to unwanted liquids? Not yet, so I’d better get right to the point.

On the rare occasion that it rains in Austin we get these deep puddles in the backyard. Of course it would be better if the water would flow out into the street and into the sewer, but that’s not how gravity works.

Water “seeks” the lowest point in the yard, but it’s narrow-minded. It doesn’t survey the environment, locate the lowest area, and head there. Rather, at each point along its path it chooses whatever direction is lowest in the immediate vicinity. Water doesn’t “know” that if only it made the effort to hop over the fence, it could get much lower, like in the sewer.

In mathematical terms, water doesn’t “globally optimize” for getting to the lowest possible point, but rather “locally optimizes” at each step. If you enjoy clichés, water misses the forest for the trees.

Maybe your A/B tests are missing the forest for the trees too.

A typical A/B test looks like this: You start with a baseline, then you make a change. Maybe the title changes from “Sour Cream Getting you Down?” to “Don’t know when Sour Cream Goes Bad?”  You test that for a while and one wins, and then you try another variation: “Is this Sour Cream Good or Bad?”

And so on, inching your way through incremental improvements. A little here, a little there, and — you believe — soon it adds up to real money.

Except, often it doesn’t.

Often what happens is you get to a point where small changes aren’t doing anything. It can be hard to recognize this effect which is why you need to (horrors!) use math to decide empirically whether anything’s actually happening.

At this point you might be tempted to give up, but that’s wrong too.

What’s happened is that you’ve found what mathematicians call a “local minimum” and what I just called a “cesspool” (and what more tasteful writers call a “watershed.”) Your test is the water in the backyard — you’ve flowed into the lowest point, but you’re still in the backyard!

Completely changing your perspective, your message, your layout, your value proposition, your colors, or your target audience might reveal an entirely new, discontinuous, non-incremental change. The real fun is in the sewer; you need to jump over the backyard fence.

In fact, because looking in completely new places has the potential to yield far more results than incremental improvement, you need to be looking for discontinuous results from the start.

The best idea is to do both: Instead of just running A versus incremental-change A2, also run a B version that’s radically different from A. Thus you reap the straightforward benefits of incremental improvements while also searching for something that could radically improve your revenue.

Better still, if a radically different message gets you massively better results, perhaps all your messaging should change accordingly. Maybe your idea of what the market wants should shift. Maybe your entire business can change for the better.

Why poop along with minor variations when you could be toying with new ideas and new identities?

Play!

• Jason, this time, unfortunately, I totally disagree with whatever you have written. Your argument is weak, which is beautifully illustrated by your phrase “A typical A/B test looks like this”.

A/B testing is a methodology, not a way of thinking what can improve conversions. What you are criticizing in this post is the ideology that small changes can make big impact. I don’t see what this has to do with A/B testing. What stops you from A/B testing large design changes? For example, see this case study where a radically different design was tested and they managed to increase sales by 20% http://visualwebsiteoptimizer.com/split-testing-blog/how-aquasoft-increased-their-sales-by-20-doing-ab-split-tests-in-multiple-phases/

In fact, it would have been much better if you wrote about what I think the best way to utilize A/B testing: do a lot of small tests to find local maxima and then occasionally do radical tests to see if you can get closer to global maxima. I think this deserves a blog post of its own.

But, anyway, I think your post is a criticism of over-reliance on small changes (which I agree to) and not at all a criticism of A/B testing. In fact, your post has nothing to do with A/B testing.

• Paras… I think you didn’t finish reading the post!

The thing you suggest — to test both with minor changes and with radical ones, always using an A/B/C testing methodology, is exactly what I propose as the solution!

• Jason, I did finish it! I was just curious about title of the post as it seemed you are criticizing A/B testing :)

• Aka local adaptive peaks. Welcome to pagerank…
.-= thruflo’s latest blog post: Why Accountants Don’t Run Startups. =-.

• So instead of simple A/B you suggest to do A/B/C…Z and then for each letter add some “numeric” variations (micro enhancements)… Where does this end? You must set the boundaries somewhere otherwise you’re lost in an unscalable analytics horror.
Moreover, do you know any analytics tool that can provide a readable and useful report of such complexity?

• No I don’t suggest testing A..Z at all. In fact my last post explains why that idea is a bad one.

Rather, I explicitly suggested you test A versus a minor A2, and also a more radical B. And that’s where it ends.

(Unless you’ve got a ton of traffic you can’t test more than that and expect statistically-significant results anyway.)

• I agree with your post and the replies you make to Paras and Shahar.

Its actually similar to what I’ve heard guys like Perry Marshall advocate. A is the original, A2 is slightly different (water in the backyard) and C is radically different (over the fence).
.-= Credit Letter’s latest blog post: The Story of the ATM: From Obscurity to a Street Corner Near You =-.

• Ed

Great analogy, it makes the point very clearly.

• Good post! Overall I agree with what you’ve said.

I think looking at your sources (the places sending you traffic) is really important too.

Let’s say you have an A/B test going on and you’ve reached your local maximum. Trying a radical new C campaign could be useful, but looking at the sources and how they converted is equally as useful.

You might notice people from source X convert 30% higher in general. If I saw that, I would try to understand the type of people source X sends me. I would then try to get more traffic from source X and maybe target my copy to those users.

Greg

• I love the A1, A2, B approach you are suggesting. In my job, I don’t get involved with A/B testing very much, but I try to do all of my business planning with three options in mind:

1. Tactical option – focus on well established core business line
2. Opportunistic option – taking advantage of opportunities as they arise
3. Strategic – how is the composition of core business changing in the next three years

I have found this framework very useful in general when it comes to short-term and longer-term planning.

• I was trying to explain to some kids why A/B testing will find ‘local maxima’/’local minima’ but probably doesn’t produce the truly optimal result. The water analogy is much better….and explaining stuff like this to kids under 5 is probably asking for trouble anyway.
.-= Joseph Cooney’s latest blog post: I use anti-virus….but probably not for the reasons you think =-.

• A different Greg

in principle this sounds reasonable. however, getting brand managers to do something “radical” is often impossible. given that, take the small wins and build on them.

• Would be nice to expand on this topic and go a bit deeper depending on circumstances as I think it will define approaches to A/B testing.

1. People are not visiting our site or we want to bring more visitors

2. People are not subscribing or we want more subscriptions

3. We want to change UI for ‘Scheduler’ in our system because we think it is not perfect right now.

Well – as you see – changing UI radically for point 3 does not make any sense. You can find your business at ‘lower bottom of a lowest hell’ and sewer will become a sweetest place to be back from there. In point 1 partially. Good approach for point 2.

With 1 – you can do A and B at the same time because you can track what brings you more visitors – your blog or different AdWords or phrasing in advertisement. With 2 – it is better to have only one version of web site to see which wording and navigation push users to subscribe page and actually getting subscribed.
.-= Igor Kryltsov’s latest blog post: Wikipedia marketing for startups =-.

• Terrific points, I agree completely.

• Thanks for this and all your great articles Jason, you’ve been a source of inspiration for a long (in internet terms) time.
Anyway to the crux of my comment.
I’m currently doing some market testing on A/B Testing and I wondered what you (and your readership) thought of the current state of A/B testing frameworks and what you’d like to see improved.
With that in mind, I’ve set up a very short “survey monkey” survey (see http://www.surveymonkey.com/s/DDMFS5T )

Thanks and continue the great work

• I took the survey for you. Probably this post is too old for lots of people to see this new comment though. :-(