As your self-annointed chief-skeptic-slash-statistician-slash-startup-commentator, I need play Devil’s Advocate against the latest buzz from the entrepreneurial online illuminati: The Startup Genome Project wherein global “rules of business” have been extrapolated from a survey of 650 startups.
But I prefer a discussion to a rant, so I showed this article to the kind folks at the Startup Genome Project and reprinted their response at the end.
Summarized here, their modern “rules of business” include “Solo founders take 3.6x longer to reach scale” and “Startups that pivot once or twice raise 2.5x more money than those that don’t.”
I love this sort of project and want to see more of its kind. At the very least it encourages introspection and self-questioning, and some people who have conveniently ignored modern business lore might now make a change for the better. And on a personal note, none of the following should be construed as an attack on the creators of the Startup Genome Project — I know their intentions are positive and they’ve invited well-intentioned criticism. So here comes some.
There are two statistical fallacies at work here, one inherent in the data, the other in how people will inevitably interpret this data, specifically when applying it’s “conclusions” to themselves.
Supposing you’re a solo founder, after reading that quote above you can’t help but start in with the self-doubt. “Maybe I need to think about finding a co-founder. Maybe life would be easier if I shared the burden. Maybe I could find a perfect complement that would be greater than the sum of the parts.”
But this is often wrong, because although the stat is true, it’s often incorrect to apply global patterns or trends to the individual.
To see why, here’s a trick question:
A person P living in Austin, Texas voted in the 2008 American presidential election for either Barak Obama or John McCain. If you had to make a bet, who did P vote for?
The obvious approach is to look at the global statistics on voters in America. More people voted for Obama than for McCain, so obviously we should wager that P voted for Barak. And in the absence of data, this is the correct bet.
This is just like taking a stat from the Startup Genome Project and placing a bet for your own company. But we’re not done.
We also know P lives in Texas — a red state where voters picked McCain by an 11% margin. So knowing that P is in Texas means that, national statistics notwithstanding, we should bet that P voted for McCain.
But we also know P lives in Austin, and Austin is a blue pimple in the red sea of the south. The Obama rally here was one of the largest in the country. So knowing that, we should wager that P voted Obama.
But most importantly there’s P the individual. At the ballet-box, P acts according personal idiosyncrasies whatever the macro-statistics might say. Knowing even just a little about P — that she (yes, she!) is passionately pro life for instance — makes it almost impossible that she voted for Obama.
Sure, in the absence of data about P the individual, statistics are your best bet, but merely because they’re your only bet. When put that way, the statistical “trend” doesn’t seem especially useful.
To make this even more concrete with a story from my own career, take for example that statement about solo founders taking much longer to get to scale. For hundreds of thousands of solo founders, “getting to scale” isn’t even the goal. (Just ask the 10,000+ subscribers to Rob Walling’s excellent blog for the “micropreneur” who never want to hire a single employee or go to a single meeting, much less “scale.”)
My third company Smart Bear was a great example of this. I was a solo founder without intent to scale, and during the first two years growth was slow, as you’d expect. It wasn’t until a serendipitous check for $50k appeared, giving me the cushion to hire an employee and see what happened, that I contemplated what it might mean to chase “scale.” Then I did chase, and we did scale, possibly 3.6x later than it would have been, but not because going solo was the wrong path, but because I chose a different path.
Macro-level data are useful for trends but not for understanding individual outcomes. Startup Genome conclusions are fascinating as high-level trends and certainly suggest possible behavior, but it’s like predicting P’s behavior from national statistics — it often doesn’t apply.
So far, none of this represents a problem with the data or conclusions themselves, but rather in how people will inevitably misinterpret it. That part can be dispelled with awareness, i.e. reading up to this point.
But the other problem is inherent in the methodology of the experiment.
First, it suffers from Survivor Bias (click for definition and examples) — they interviewed 650 companies, which means extant companies, which means we don’t know which of these trends contributed to success and which are just trends that everyone is doing, even the companies who failed.
It’s like looking at log cabins in the west and remarking how solidly they’re built. But the ones which weren’t built so solidly — most of them — have collapsed. If the solid ones were in fact constructed differently, that would be a de facto experiment in how best to build log cabins, but if they were all constructed using the same few techniques — which they were — you’d conclude that fundamental technique is less important than maintenance or geography or luck or something else.
The Startup Genome Project looks only at standing log cabins and thus isn’t telling us what separates the successes from the failures. If the failures follow the same patterns as the successes, the “patterns” are descriptive but not helpful in guiding us to building better companies.
Second, it suffers from the Pattern-Seeking Fallacy (click for definition and examples) — they generated a ton of data, asked a ton of questions, “discovered” that some of their initial theories were statistically-significant, and published only the significant results.
After reading that linked article, you’ll instantly recognize the fallacy glistening in this quote from Genome co-creator Ron Berman:
In our process we created a huge amount of different cuts, views, cross tabulations, graphs, means and statistical significance tests to check relationships and correlations in our data. Once we had those, we leaned back, looked at them from afar and tried to ask ourselves “do these results make sense.”
The fact that statistically-insignificant results were not published is as problematic as not investigating startup failures — it removes half the story, half the perspective.
Specifically: They constructed a model of how they believe companies look and behave, then asked a bunch of questions to see if the model is supported by data. With hundreds of questions and literally millions of data points correlating one thing to another, they found some data supporting some parts of the model, and some data did not (i.e. was not statistically significant). But they reported only the supporting data.
Half the story is not acceptable, not when you have so much data that at the significance levels they reported (often 90% and sometimes even 80%), the Pattern-Seeking Fallacy tells us they are guaranteed to find “significant” results.
Human beings are incurable pattern-seekers; we see patterns when there are none, and we desperately latch onto theories which purport to explain the mysteries of the world, from theories of physics to battles of religion to the Modern Laws separating successful startups from the failures.
We should listen, we should contemplate, but we shouldn’t blindly follow.
Counter-point, by Ron Berman and the Startup Genome Team
Let me start this commentary (or counterpoint, if you prefer) from the end.
There’s a lot of smart things posted in this post, but if you must remember
just one thing, it is “don’t blindly follow.”
I couldn’t agree more, which is why I consider responding to Jason’s blog
post a treat, for two reasons:
- I read this blog, quite frequently.
- The criticism is well thought and to the point. This is how we improve
our research, and I hope you can help us with it as well.
To make reading easier, this is the summary of my comments. The full
comments are below.
- The comment about applying statistics to individuals is correct.
However, it is not a fallacy, it is a feature. Really. In addition, we do
not focus on providing benchmarks using a macro view. We delve into
specific details of startups to classify them into specific groups.
- Survivorship bias may indeed be a valid issue in our sample, and I will
explain below how we treated it and will continue to treat it.
- The pattern seeking fallacy is constantly an issue with data driven
analysis not based on experimentation. We had given careful attention to
include results we can explain and show significance for. There is some
(small) chance some of them are wrong. We consider this in our
And now, for the full commentary. Some of it might sound philosophical, but
that’s sometimes the nature of statistics.
1. Does it apply to me?
The results of the Startup Genome report, and any statistic for that
matter, rarely apply to an individual sample. This is the nature of the word
statistic. It is a summary of analysis on data, and if you pick one item from the data,
randomly and independently, then on average the result will apply. In other words — only if
you perform the experiment again and again, each time re-choosing the
individual sample, will our result, and any statistical result apply.
Why is this a feature? Because statistics are meant to be able to describe
and predict samples and populations, not individuals.
There’s validity to the not acting based on macro-trends criticism, but we
don’t tell entrepreneurs to do that. Our prescriptive advice is more in the
Secondly, our framework allows us to offer increasingly personalized advice. Startups are given advice not just based on the global averages but based on their type and stage. In the future we are considering even incorporating benchmarks comparing people with the same ambitions (such as not wanting to create a scalable company).
2. Survivorship bias can be dealt with using a longitudinal panel, or by
designing an experiment. Unfortunately, we cannot randomly allocate
exogenous conditions to startups (“Hey you, want to start a company? Can
you please do it with two randomly assigned co-Founders?”).
Our results therefore apply only to the sample we analyzed. In other words,
if you take another sample of startups, they might not apply.
We approached this issue from two directions. So far we have analyzed the
stage of a startup, and not from a binary “success/failure” view. This
handles the survivorship bias partially since startups who are about to die
appear in our data as being in a lower stage after much longer time.
Essentially, this technique is similar for using the age of a person to
proxy for his likelihood of dying in the next year. We do not know for sure
if it will happen, but on average, we are correct.
Another standard data driven solution for this issue is cross-validation of
the data (also known as out-of-sample validation). The method is simple —
the data is segmented randomly into groups of “training” and “validation.”
Then, the model is being run on “training” data, and then the results are
checked on “validation” data. If the model predicts well on the validation
data, it has higher chances of being correct.
We have not used this technique in our current report, as the sample is too
small, but plan on applying it in the future. There are other assumptions
we made to handle this bias, but I hope this served as a good reference on
what we did.
3. Pattern seeking fallacy ñ There are two issues here ñ will there be
incorrect results being reported just because there is so much data, and why
do you only report statistically significant results?
The answer is “Yes,” and “Yes.”
There might be incorrect results being reported. In general whenever you
receive a result which is significant with 95% confidence, there is 5%
chance it is incorrect. In other words, out of every 20 graphs/results
anyone publishes anywhere, there is probably one incorrect. The solution is
to retest the hypotheses constantly. For this reason we collect more and
more data, and retest.
As for publishing only (or mostly) significant results — this is a subtle
issue. When a result does not pass a statistical significance test, it does
not mean it is incorrect, or that the opposite is correct. It just means
there is not enough data to tell. You cannot and should not publish such a
result, because its only conclusion is “I need more data.”
When we built our analysis, we had a specific model we tested, and a (long)
list of hypotheses we tested.
This is not the same as observing a streak of 10 heads when throwing a coin
1,000 times and concluding the coin is biased. Reaching such a conclusion
for a coin is just applying statistics incorrectly. Streaks of coin results
are not a sufficient statistic (at least in this example). The average of the results is.
We had some very strange results that turned up in our data. The sample
size proved too small to yield any conclusion. Instead of publishing a wild
speculation based on them, we preferred to collect more data, and publish it
later if the results make sense (in a statistical sense).
Which is where all of the readers of this blog (and other blogs) can come
to help. First of all, we encourage you all to fill out the 2nd version of
the Startup Genome survey. Not only will you gain insight about your
specific firm compared to other similar firms. You will also help us
provide the community with better and finer-tuned results in the future. In
addition, you will experience first hand what all this hoopla is about, and
can hopefully develop your own intuition about the validity of the results.
Second, we improve by trying, making mistakes and correcting them. The
standard academic process being done on such research is called
“counter-factual analysis”, or in English — “check for alternative
explanations.” Pointing out potential biases in our data and methodology is
useful. However, if you can also provide an alternative explanation to the
results that we can test empirically on the data, then our work will truly
reach deeper insights and potentially greatness.
So, thanks Jason for investing the time in this analysis. We have given and
will your excellent feedback the attention it deserves, and will improve
constantly. We hope to hear more and learn more.
–The Startup Genome Team