What do these have in common?
- “This pitcher has retired 5 of the last 7 batters.”
- “We tried 10 AdWord variants and combination D is the clear winner.”
- “The Bible Code predicted the Sept 11 attacks 5,000 years ago.”
- “We sliced our Google Analytics data every which way, and these 4 patterns emerged.”
All are examples of a common fallacy that I’m dubbing the “Pattern-Seeker.”
You probably laugh at Nostradamus, yet it’s likely you’re committing the same error with you own data.
Patterns in Chaos
It’s commonly said that basketball players are “streaky” — they get on a roll hitting 3-pointers (have a “hot hand”) or develop a funk where they can’t seem to land a shot (“gone cold”). These observations are made by fans, announcers, pundits, and the players themselves.
In 1985 Thomas Gilovich (featured in the entertaining book Innumeracy) tested whether players really did exhibit streaky behavior. It’s simple — just record hits and misses in strings like: HMHHMMMMHMMHH, then use standard statistical tests (specifically autocorrelation) to measure whether those strings are typical of a random process, or whether there was something more systematic going on.
Turns out players are not streaky; simply flipping a coin produces the same sort of runs of H’s and M’s. The scientists gleefully explained this result to basketball pundits; the pundits remained non-plussed and unconvinced. (Surprised?)
So they tried the same experiment backward: They created their own strings of H’s and M’s with varying degrees of true streakiness and showed those to pundits and fans, asking them to classify which were streaky. Again they failed spectacularly.
We perceive patterns in randomness, and it extends beyond casual situations like basketball punditry, plaguing us even when we’re consciously trying to be analytical.
Take the “interesting statistic” given by the baseball announcers in the first example above. Sure the last 5 of 7 batters were retired, but the act of picking the number “7” implies that number 8 got on base. Maybe number 9 did too. Of course saying he “retired out 5 of 9 batters” doesn’t sound as impressive even though it’s the same data!
But unlike the basketball example, the baseball announcer’s error runs deeper, and following that thread will bring us to marketing data and the heart of the fallacy.
Baseball records a dizzying array of statistics which announcers — or more correctly, staff statisticians — eagerly regurgitate. Maybe it’s because baseballers are a little OCD (just look at pre-bat and pre-pitch rituals) or maybe it’s because they need something to soak up the time between pitches, but in any case the result is a mountain of data.
Announcers exploit that data for the most esoteric of observations:
“You know, Rodriguez is 7 for 8 against left-handed pitchers in asymmetric ballparks when the tide is going out during El Niño.”
This is the epitome of Pattern-Seeking — combing through a mountain of data until you find a pattern.
Some statistician combed through millions of combinations of player data and external factors until he happened across a combination which included a “7 of the last 8,” which sure sounds impressive. Then he proudly delivered the result as if it were insight.
So what’s wrong with stumbling across curious observations? Isn’t that exactly how you make unexpected discoveries?
No, it’s how to convince yourself you’ve made a discovery when in fact you’re looking at pure randomness. Let’s see why.
Even a fair coin appears unfair if you’re Pattern-Seeking
The fallacy is clearer when you look at an extreme yet accurate analogy.
I’m running an experiment to test whether a certain coin is biased. During one “trial” I’ll flip the coin 10 times and count how often it comes up heads. 5 heads out of 10 would suggest a fair coin; so would 6 or even 7, due to the usual random variations.
What if I get 10 heads in a row? Well a fair coin could exhibit that behavior, but it would be rare — a 1 in 1024 event. So if my experiment consists of just one trial and I get 10 heads, the coin is suspect.
But suppose I did a lot of trials, like 1000. A fair coin should still come up heads 3-7 times per trial, but every once in a while it will come up 9 or 10 times. Those events are rare, but I’m flipping so much that rare events will naturally occur. In fact, in 1000 trials there’s a 62% chance that I’ll see 10 heads at least once.
This is the crux of the fallacy. When an experiment produces a result that is highly unlikely to be due to chance alone, you conclude that something systematic is at work. But when you’re “seeking interesting results” instead of performing an experiment, highly unlikely events will necessarily happen, yet still you conclude something systematic is at work.
Bringing it home to marketing and sales data
Let’s apply the general lesson of the coin-flipping experiment to Google Analytics.
Take Google Analytics. There’s a hundred ways to slice and dice data, so that’s what you do. If you compare enough variables enough ways, you’ll find some correlations:
“Oh look, when we use landing page variation C along with AdWord text F, our conversion rate is really high on Monday mornings.”
Except you sound just like the baseball announcer, tumbling combinations of factors until something “significant” falls out.
Except you’re running 1000 coin-flip trials, looking only at the trial where it came up all heads and declaring the coin “biased.”
Except you’re seeing streaks, hoping that this extra-high conversion rate is evidence of a systematic, controllable force.
So what’s the answer?
The fallacy is that you’re searching for a theory in a pile of data, rather than forming a theory and running an experiment to support or disprove it.
- Instead of running multiple AdWords variants each against multiple landing page variants each feeding a different website funnel, run just one experiment at a time, one variable at a time.
- Instead of using a thesaurus to generate 10 ad variants, decide what pain-points or language you think will grab potential customers and test that theory specifically.
- Instead of rooting around Google Analytics hoping to find a combination of factors with a good conversion rate, decide beforehand which conversion rates are important for which cohorts, then measure and track those only.
Do you have more examples of what to do or what not to do? Leave a comment and join the conversation.
48 responses to “The Pattern-Seeking Fallacy”
Here’s an interesting fact about learning: You maximize learning when the probability of an outcome is 50/50 (via Reinertsen from his book: The Principles of Product Development Flow).
This suggests that you pick bold outcomes early on and not waste a lot of effort on incremental optimization.
.-= Ash Maurya’s latest blog post: Troubleshooting Free Trials =-.
See also: Technical Analysis in the stock market.
This is known as “Availability Bias” – http://en.wikipedia.org/wiki/Availability_heuristic – wherein people predict the frequency of an event, or a proportion within a population, based on how easily an example can be brought to mind.
Very true, and very relevant – just good to put names on things like this.
.-= Ross Hudgens’s latest blog post: 10 Bloggers Talk About Personal Development =-.
> “This pitcher has retired 5 of the last 7 batters.”
> Take the “interesting statistic” given by the baseball announcers in the first example above. Sure the last 5 of 7 batters were retired, but the act of picking the number “7” implies that number 8 got on base.
Not so fast. That’s what baseball announcers say when the 8th batter is walking up to the plate. They’re certainly not implying that the 8th got on base because that’s still unknown.
Yes, that sentence structure is used in contexts where the outcome of the next case is known, but it’s also used when the outcome is still unknown. If you want to use it as an example of the former, you need more context description than “baseball announcers”.
You and I are talking about different “8th” batters.
I’m counting backwards; what I mean is 8 batters ago he got on base. The announcer only looks 7 back instead of 8 back because it makes the numbers look more impressive, but of course that’s terrible statistics.
Michael Shackleford, a professional actuary who analyzes casino games over at http://www.wizardofodds.com, talks about this pattern seeking quite often. He points out that the results display on the Roulette tables in casinos play on this pattern-seeking tendency we have. If that display shows black has come up the last 5 spins, some might tend to think that “red is due” and start betting on red – even though each spin of the wheel is completely random and has no bearing on future spins. To take this theme further, Craps players will often comment about a table being “hot” or “cold” and make bets accordingly.
Juxtapose this against the Blackjack player who counts cards or uses the mathematical “basic strategy.” This person is using true analysis and the math to work the game more in his or her favor.
As web marketers, we need to look more long term. Many of us look for quick answers and try to make sense out of events based on those short-term stats. Sometimes it’s best to let something sit and churn for a bit before deciding to make a change. Otherwise, we’re just “going by our gut” and not really learning the full story. I think Jason is right to point out we might be better off in the long run making fewer decisions based on longer analysis.
.-= Elmer’s latest blog post: Book Review: “Radically Transparent” by Beal & Strauss =-.
pb > c
This is a pretty interesting read. I really enjoyed the comparison between marketing analytics and sports. I agree that lot of marketers get carried away by patterns and sometimes it is really hard to separate real facts from unwanted patterns.
I would also like to add that using statistical significance to identify the winning pattern / ad solidifies the test results. 90% + confidence index is a good indicator that the trial is a winner.
.-= sameer’s latest blog post: Google vs Bing: Advanced SEO Analytics =-.
The confidence interval is not sufficient if you’re seeking rather than testing.
It’s a common experimental fallacy to try many things and seek which thing beat a predefined confidence interval. CI’s are for null-hypothesis testing — you have to have a null hypothesis to begin with!
I think I understand the statistics, but it’s the basketball example that always gets under my skin. I haven’t read the studies, but I imagine they might be leaving out some important factors in their analyses. Specifically, when did a certain player go on a hot streak (4th quarter or during important games would be an indicator of non-randomness)? Are hot streaks more prevalent for better shooters overall?
I just can’t make believe that the researchers of those studies have accounted for all of the variables. It may look like a random string, but the context might say otherwise. Or am I way off base here?
You’re not off base. What you’re doing is proposing various, reasonable theories to explain streaky behavior.
What ought to happen next: Test those specific theories against the data and see whether those kinds of correlations exist. If not, would you give up your idea that it is streaky?
Would you at least agree that Occam’s Razor would say that if a coin-flip model explains the same results, that that would seem to be the status quo until a more complex theory is shown to hold water?
True, yet wouldn’t you agree the Razor is a glib statistical tool with its severe limitations? :)
There have been studies to try to figure out if streaks in sports are significant. Check out the 2nd part of this Radiolab episode:
There’s also a recent Salon article about Kobe Bryant’s status as a clutch player:
It’s OK to run multiple simultaneous experiments as long as you account for that when you calculate the significance of your results. In statistical lingo, you need to account for the so-called trials factor. By running multiple experiments, the overall significance of each is reduced since you have a greater likelihood of seeing a large random fluctuation with more experiments.
Yes you’re right, for example the F-test and ANOVA are typical ways of doing that.
Of course few people do that, especially in marketing! Which is what I’m trying to warn against.
Don’t take this personal, but what a load of crap! First off patterns exist and those intelligent enough can see them in seconds. Secondly, I have played basketball at all levels (not Pro but play with Pro players during off season). Yes, shooters have streaks. You can quantify it all you want with numbers, after the fact, but unless you have been there you don’t know. Streaks are usually caused by off the court issues. For instance, while playing college ball my shooting sucked and my play for that matter, but I happened to get into a fist fight and whooped this guys ass in some parking lot. From that point on I was on fire. Confidence!
Statistics is crap. I am sure if you ran the statistics on Apple, 15 years ago, you would say no way they are going to ever beat Microsoft’s market cap. Or that’s like running statistics to determine how many company breaks during the summer when its hot will keep workers from quitting their jobs. Math and statistics will be less effective than giving random breaks based on when attitudes change or moral is low.
Another example, is product development. Check out the PDMA. The PDMA has several articles on how to run statistics to determine if a product will be successful or not. The statistics way doesn’t work….and no it isn’t because the equation was wrong. Its because they don’t take into account human feelings and emotions.
Let’s make a wager. Next Final Four run statistics on who you think will win the entire thing and lets see how this all works out.
Check out the stuff that Nate Silver does at FiveThirtyEight, or the statistical intelligence models that pick stocks better than professional brokerage firms. People *have* predicted the NCAA men’s basketball playoffs, with quite good success.
Also, as far as your basketball experience goes: the plural of anecdote is not data.
The point of statistics is to extract meaningful or interesting information about a heap of data. Whether that information can be used to predict things about other heaps of data is an entirely different problem.
.-= Victor Nicollet’s latest blog post: Brain Dump =-.
The human brain likes patterns. We subconsciously look for them because it’s a way to make sense of the constant information we are bombarded with. It’s also easier to remember things if they fit a pattern. I recently heard that stereotypes are a result of this need to find patterns in our environment.
I could have done without all the sports examples…but I’m a girl, so I don’t like sports :-)
.-= Zuly Gonzalez’s latest blog post: As Strong as Your Weakest Link =-.
Possibly there are billions of patterns at the human and universal level.I find basically two main trends.One that drives the world on its way towards higher states of evolution – the second one drives us towards doom.At business level the same forces are at play.
Exactly! Humans are nothing more than a virus. You have lateral thinkers/right brained which move us forward. And those analytical people, which I label doers, who though work hard tend to over focus and think they can see patterns or big pictures, but actually don’t and drive a company, product or earth into the ground.
The intent is commendable, but you are you unfairly hijacking a term.
What you are describing is basic statistical illiteracy, which, if I may help, is not helped by curt comments which call the statistics out in technical terms. The comments are almost too curt – saying “CI’s are for null-hypothesis testing” in a culture of pervasive misapplication of T-testing is borderline irresponsible.
Pattern matching is broader than not being able to isolate variables in a correlation test, and its grossiest misuses are not statistical in nature, but rather systems analytical. A reasonable example in this blog could be “as a startup founder, you will spend a significant amount of time and effort raising angel and VC capital”. Unless you won’t. ;)
There’s a reason they play the game. It’s the outcome of *this* trial that fans care about, not the expected outcome over a large number of trials.
As a geek and sports fan, I have to say I’m conflicted on the sports “statistics”. I think my take is, it does not matter that the announcers are citing meaningless numbers. Their point is to make the game more interesting. The numbers have no more bearing on the outcome of a given game than the pregame/halftime features about players’ childhoods, college days, overcoming some rare (or common) disease. They serve the same purpose: to give us a reason to care about the game. Something to talk about. A reason to cheer.
They do one more thing. They explain what we are about to see. Athletes are notoriously superstitious and they will rely on the “hot hand”. So, if an announcer pulls out the “hot hand” statistic, you can assume that the players have noticed too. Teammates will pass the ball to that player more often. Defenses will be drawn up to stop the seemingly unstoppable. So, whether they are statistically significant is neither here nor there when it comes to enjoying the game.
I completely agree — it’s fun and interesting! And I admit I didn’t make that distinction in the article.
Really I just wanted to take a well-known example of the fallacy in question, because when you apply the same logic to e.g. marketing data, then the goal really is “the truth” and not entertainment.
Great article. People will always be superstitious, and this is precisely why.
At the casino:
“You want to take this slot machine? It’s hot!”
a.k.a – Clustering Illusion
I agree about the baseball thing. I think a lot of announcers are aware that they’re pattern-seeking but do it because it’s part of the culture of the game. They’re pointing out the pattern the way you might point out a pretty seashell while walking along the beach.
The basketball case intrigues me, however. When you’re watching somebody who appears to have a “hot hand”, it seems that balls go in cleanly without hitting the rim; that they’re shooting more accurately than normal (balls that hit the rim can bounce out or in following some kind of head/tails pattern, and will exhibit streaks). Not that I’ll take the time to do the work, but it would be interesting to look at one of the most famous basketball “hot-hand” shootouts ever — Larry Bird vs. Dominique Wilkins http://www.nba.com/history/shootout_boston.html — and see how many of those baskets were “nothing but net.”
Now, of course, if they were indeed “nothing but net” I guess the next question is, “how often does a ‘nothing-but-net’ sequence happen by random”?
I guess my point is, a “hot hand” sequence might (this is hypothesis, not an assertion) be further dividable into a “nothing but net” component and a “random bounce after hitting rim” component.
In either event, I watched that Bird/Wilkins game on live TV, and I’ll never forget it. As Kevin McHale said, “There was one stretch that was as pure a form of basketball as you’re ever going to see.”
.-= John Sundman’s latest blog post: Wanna get rich? =-.
Yes, their are different levels of “Hot Hand”. Sometimes you are so mentally in the zone, usually from confidence, that you kind of feel connected with the basket. This is when you hear the players say something on lines of “I felt like the rim was a huge basket.”. For me personally, it felt like my arm and hand were connected or one with basket. So imagine you are shooting a ball hand above your hip and as you move your arm up and over the basket, not literally as you are 18 feet away, it feels as those your hand is going right in the basket as if you were standing two inches away. Like shooting on a cubicle hoop when you are one inch way…same feeling.
There’s nothing wrong with searching for patterns in piles of data, provided you’re not misled by the human hunger for meaning and narrative. Plenty of very successful companies (and other ventures) rely heavily upon valid statistical data mining. Once you’ve got a pattern that is really a statistical outlier, it’s reasonable to propose a theory to explain it, and certainly such theories should be experimentally confirmed before one commits to them. I think there’s a danger here of “premature explanation”, analogous to Knuth’s “premature optimization”.
.-= Will Ware’s latest blog post: Saint-Saëns’ Symphony No. 3 in C minor =-.
Statistics is very interesting to me. It’s an area I have little knowledge of, but I’m planning to take a course in. Statistics plays a significant role in Bioinformatics, a field I would like to move into.
I read these two articles that are somewhat related:
– Who Needs Science?: http://www.information-management.com/specialreports/2008_88/10001650-1.html?type=printer_friendly
– The End of Theory: The Data Deluge Makes the Scientific Method Obsolete: http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
I thought those articles were very interesting.
Your article was very interesting and well written, thanks!
Climate scientists should read this article.
I think this analysis suffers its own fallacy, that Taleb” refers to as the “Ludic Fallacy,” whereby one falsely applies statistical “randomness” to real world circumstances where too many variables apply and don’t fit.
Of course players are “streaky.” In game 2 of the current NBA finals, Ray Allen set an NBA record for hitting 8 three point shots, all in the first half. He was 0-3 in the second half. In game 3, Allen went 0-8. use whatever adjective you like, but streaky isn’t a bad one.
If a player averages 50% “from beyond the arc” over the course of a season, but shoots 75% one game and 25% in another, that is, by definition, streaky and has little to do with randomness. There are myriad reasons why one night might be better than another: playing on the road, partied the night before, sore hand, tired legs, better defense, got in a fight with the wife, etc., etc. This has nothing to do with with “randomness” of flipping a coin, which is not subject to all the variables.
.-= Brant Cooper’s latest blog post: Customer Development Funnel Image v.4 =-.
The bottom line is that the brain’s job is to look through a billion bits of data every second of every day (hearing, sight, smell, taste, touch, thoughts, beliefs, etc.) and decide what’s relevant to you and your survival. It’s a meaning-making machine that evolved to keep our ancestors safe from the lions (rustling bushes = lions…)
Looking for patterns allows us to determine what’s relevant. Looking for explanations for behaviors is a whole nother conversation.
Applying the scientific method (coming up with a theory that explains the known facts and designing an experiment to test that theory) to marketing is a useful way to think about it.
However, unless you’ve done research first – so that you know your target/niche market, you know what their words for their problems are and you’ve designed both your product format, packaging and marketing materials to match their words and their way of seeing/meaning-making – your tests will simply be throwing spaghetti against the wall to see what sticks. Which can get quite expensive.
.-= Debra Russell’s latest blog post: Marketing for Musicians =-.
The brain is hardwired to seek patterns (this is the core learning mechanism). Actually, endorphins are released every time you get a hunch that there is a pattern in some stuff you’re looking at.
.-= fdeth’s latest blog post: fdeth: RT @asmartbear: New Post: The Pattern-Seeking Fallacy: http://bit.ly/FalsePat =-.
“The tendency of the casual mind is to pick out or stumble upon a sample which supports or defies its prejudices, and then to make it the representative of a whole class. ” – Walter Lippmann (Two time Pulitzer Prize winning author, writer, reporter, 1889 – 1974)
Nice post. Always an interesting topic. On the basketball example, it doesn’t surprise me that the pattern of HMs is random when analysed using autocorrelation. But that doesn’t really mean it’s truly ‘random’. As another commenter pointed out, if a system is complex enough the outcome appears random to us, but might in fact be controlled by many factors – too many to make sense of.
Imagine a basketball league where players can ‘control’ streaks under certain optimal conditions, but they don’t understand these conditions. They just know that when those conditions present themselves they have a higher probability of playing well. For the rest of the time it’s largely random – they know they can hit 40% from the field but don’t have control over it. And then at other times they can’t even hit 40% and they realize they can hardly put two good shots together.
In this league, the ‘streaks’ would be real as there is some optimal set of conditions that affect them that come into play. However, the rest of the time it’s partly random and partly affected by period where there is a set of very sub-optimal conditions.
The outcome of this could easily appear to be random, but in fact’s it’s only partly random.
Simply put, the appearance of randomness doesn’t mean the underlying process is necessarily random. It might be, it might not be. You just can’t tell for sure.
When you write : “The fallacy is that you’re searching for a theory in a pile of data, rather than forming a theory and running an experiment to support or disprove it” you brought together everything that’s necessary, but you forgot to combine them.
It is OK to “search for a theory in a pile of data”, if you realize that all you find is a theory, not a fact. The next thing to do is “to run an experiment to support or disprove it”. That is why in data mining we always use a sufficiently large sample of data as hold-out, not seen by the model (the theory) to test it afterwards.
Data mining is finding a theory in a bunch of data. Testing the theory is done by conventional statistics. And yes : be aware that if you test 20 theory’s, one of them will be significant by chance !
Yes! And if you’re doing all of that — which I agree data-mining software is great for — you’re solid.
My point is: That’s not how people normally look at things like Google Analytics data or marketing data or other business data. I tried to explain why that’s a problem.
“Instead of using a thesaurus to generate 10 ad variants, decide what pain-points or language you think will grab potential customers and test that theory specifically.”
Totally agree, and the thesaurus method might give rise to more visitors or get you a higher in the search results for those terms – but do those visitors convert (in whatever terms you measure that, whether a direct purchase, a follow up phone call, subscribe to email newsletter etc).
When you hand pick your terms, think what your potential ideal customer might be searching for and attract the ones who will convert, not just “window shoppers”.
This also comes back to “you get what you measure” – if you run a test and measure visitors, you will choose the terms / layout / whatever that gives the most visitors. If you measure new visitors, or conversions, or time on site, or some other metric, you will choose the test option that gives rise to this – so decide on what metric you need to improve before you start down this testing process.