What do these have in common?
- “This pitcher has retired 5 of the last 7 batters.”
- “We tried 10 AdWord variants and combination D is the clear winner.”
- “The Bible Code predicted the Sept 11 attacks 5,000 years ago.”
- “We sliced our Google Analytics data every which way, and these 4 patterns emerged.”
All are examples of a common fallacy that I’m dubbing the “Pattern-Seeker.”
You probably laugh at Nostradamus, yet it’s likely you’re committing the same error with you own data.
Patterns in Chaos
It’s commonly said that basketball players are “streaky” — they get on a roll hitting 3-pointers (have a “hot hand”) or develop a funk where they can’t seem to land a shot (“gone cold”). These observations are made by fans, announcers, pundits, and the players themselves.
In 1985 Thomas Gilovich (featured in the entertaining book Innumeracy) tested whether players really did exhibit streaky behavior. It’s simple — just record hits and misses in strings like: HMHHMMMMHMMHH, then use standard statistical tests (specifically autocorrelation) to measure whether those strings are typical of a random process, or whether there was something more systematic going on.
Turns out players are not streaky; simply flipping a coin produces the same sort of runs of H’s and M’s. The scientists gleefully explained this result to basketball pundits; the pundits remained non-plussed and unconvinced. (Surprised?)
So they tried the same experiment backward: They created their own strings of H’s and M’s with varying degrees of true streakiness and showed those to pundits and fans, asking them to classify which were streaky. Again they failed spectacularly.
We perceive patterns in randomness, and it extends beyond casual situations like basketball punditry, plaguing us even when we’re consciously trying to be analytical.
Take the “interesting statistic” given by the baseball announcers in the first example above. Sure the last 5 of 7 batters were retired, but the act of picking the number “7″ implies that number 8 got on base. Maybe number 9 did too. Of course saying he “retired out 5 of 9 batters” doesn’t sound as impressive even though it’s the same data!
But unlike the basketball example, the baseball announcer’s error runs deeper, and following that thread will bring us to marketing data and the heart of the fallacy.
Baseball records a dizzying array of statistics which announcers — or more correctly, staff statisticians — eagerly regurgitate. Maybe it’s because baseballers are a little OCD (just look at pre-bat and pre-pitch rituals) or maybe it’s because they need something to soak up the time between pitches, but in any case the result is a mountain of data.
Announcers exploit that data for the most esoteric of observations:
“You know, Rodriguez is 7 for 8 against left-handed pitchers in asymmetric ballparks when the tide is going out during El Niño.”
This is the epitome of Pattern-Seeking — combing through a mountain of data until you find a pattern.
Some statistician combed through millions of combinations of player data and external factors until he happened across a combination which included a “7 of the last 8,” which sure sounds impressive. Then he proudly delivered the result as if it were insight.
So what’s wrong with stumbling across curious observations? Isn’t that exactly how you make unexpected discoveries?
No, it’s how to convince yourself you’ve made a discovery when in fact you’re looking at pure randomness. Let’s see why.
Even a fair coin appears unfair if you’re Pattern-Seeking
The fallacy is clearer when you look at an extreme yet accurate analogy.
I’m running an experiment to test whether a certain coin is biased. During one “trial” I’ll flip the coin 10 times and count how often it comes up heads. 5 heads out of 10 would suggest a fair coin; so would 6 or even 7, due to the usual random variations.
What if I get 10 heads in a row? Well a fair coin could exhibit that behavior, but it would be rare — a 1 in 1024 event. So if my experiment consists of just one trial and I get 10 heads, the coin is suspect.
But suppose I did a lot of trials, like 1000. A fair coin should still come up heads 3-7 times per trial, but every once in a while it will come up 9 or 10 times. Those events are rare, but I’m flipping so much that rare events will naturally occur. In fact, in 1000 trials there’s a 62% chance that I’ll see 10 heads at least once.
This is the crux of the fallacy. When an experiment produces a result that is highly unlikely to be due to chance alone, you conclude that something systematic is at work. But when you’re “seeking interesting results” instead of performing an experiment, highly unlikely events will necessarily happen, yet still you conclude something systematic is at work.
Bringing it home to marketing and sales data
Let’s apply the general lesson of the coin-flipping experiment to Google Analytics.
Take Google Analytics. There’s a hundred ways to slice and dice data, so that’s what you do. If you compare enough variables enough ways, you’ll find some correlations:
“Oh look, when we use landing page variation C along with AdWord text F, our conversion rate is really high on Monday mornings.”
Except you sound just like the baseball announcer, tumbling combinations of factors until something “significant” falls out.
Except you’re running 1000 coin-flip trials, looking only at the trial where it came up all heads and declaring the coin “biased.”
Except you’re seeing streaks, hoping that this extra-high conversion rate is evidence of a systematic, controllable force.
So what’s the answer?
The fallacy is that you’re searching for a theory in a pile of data, rather than forming a theory and running an experiment to support or disprove it.
- Instead of running multiple AdWords variants each against multiple landing page variants each feeding a different website funnel, run just one experiment at a time, one variable at a time.
- Instead of using a thesaurus to generate 10 ad variants, decide what pain-points or language you think will grab potential customers and test that theory specifically.
- Instead of rooting around Google Analytics hoping to find a combination of factors with a good conversion rate, decide beforehand which conversion rates are important for which cohorts, then measure and track those only.
Do you have more examples of what to do or what not to do? Leave a comment and join the conversation.