• Dave

    Ron Berman says “In general whenever you receive a result which is significant with 95% confidence, there is 5% chance it is incorrect.”

    No. In general, when you test 100 false hypotheses to 95% confidence, 5 of them will incorrectly pass the test.

    The proportion of results you get that are significant to 95% confidence depends on the proportion of hypotheses you test that are actually true.

    • http://twitter.com/jonintweet Joni S.

      I didn’t get this. Could you elaborate, please?

  • BillSeitz

    Maybe I’m wrong, but I don’t think you really addressed Jason’s point about fishing-for-relationships. You might want to take a look at the Science-Based-Medicine critiques of Evidence-Based-Medicine. Or check out http://xkcd.com/882/

  • Anonymous

    I don’t quite agree with the first point (“solo-founder” example). This report is based on statistics. If you have absolutely no idea what to do (like predicting the next result of flipping a coin) you should go with what is most likely to succeed (in the coin example there is no more likely result, so you guess, but in the case of the solo-founder you should probably find a co-founder). Of course it is unlikely that you find yourself in a situation where you have no other factors influencing you, but if you do, going with the average is the correct choice…

  • http://twitter.com/michaelpinto Michael Pinto

    If you view the project as a cookbook of success you’ll be disappointed, but I did find a few good points of wisdom in the study that I’ve heard before. It’s also important t note that the study really makes sense if you’re doing a tech startup in the Valley in 2011 — for any other type of startup those “rules” don’t really apply. 

  • Anonymous

    Love this….do not just look at backward facing statistics to validate or invalidate a point…as a friend of mine often says, even a squirrel finds a nut every now and again.

  • Jonas

    That’s a (too) long reply from them, addressing none of the concerns you raise. I’d say their statistics is completely made up. Not publishing the not significant hypotheses is the most glaring error. The rest could be just the expected chance.
    But why write such a long nonsensical reply? Is it the usual economics dislike of maths that shine through again? Or do they not want help to do better? I think the latter is just as likely.

    • http://blog.asmartbear.com Jason Cohen

      Though I agree that some points weren’t fully refuted, I definitely do not believe they made up data or were nefarious. I also think they’ll take some of this to heart for the next round of surveys and analysis.

      But I also fear that given who is involved with the project, there’s a vested interest in asking certain questions and seeing the data come out a certain way to validate what they’ve been preaching for the last few years.

      Still we have to be careful not to damn them — it’s quite possible that what they’re trying to “prove” and what’s actually true are identical things! But we have to be mindful that neither are they an unbiased organization.

  • Anonymous

    Applying statistics heavily boils down to the old “What do you know and when do you know it.”.

    (1) Voting and Conditioning.

    For the voting example, that is, McCain or Obama, given that the voter was in Texas, we are taking the ‘conditional probability’ that the voter was in Texas.  Then with the voter in Austin, we are ‘conditioning’ again.

    Conditioning is powerful stuff, can be seen as the best way to use ‘information’.  E.g., if you want have X and want to predict Y, you can consider the conditional expectation of Y given X, that is, E[Y|X]:  This is a ‘good’ estimate of Y in that it is ‘unbiased’, that is, the expectation of Y E[Y] = E[E[Y|X]] the expectation of the estimate.

    Next, E[Y|X} is the most ‘accurate’ estimate of Y because for some function f E[Y|X] = f(X) and this f makes E[(Y – f(X)]^2] as small as possible.

    We want ‘all’ of the ‘information’ in X. So if there is some function g and we take E[Y|g(X)], it is immediate from what we’ve said that we can’t hope that this will be a better estimate of Y than f(X) = E[Y|X].

    We should also notice that if Y really is a ‘function’ of X, then E[Y|X] = Y, that is, with X we will have enough ‘information’ to predict Y exactly.

    Generally, and intuitively, the more ‘information’ in X, the better our prediction.

    Okay, back to voting, for the ‘information’ for X we’ve considered being in Texas and then in Austin.  But there could be much more relevant ‘information’ for the given person, and with enough such information we could predict exactly.  Or, if ask the person to explain their vote, they may give data particular to them and never mention either Texas or Austin, and this data may be compelling.  So, it can be that the voter knew who they were going to vote for and why in precise terms.  The estimate E[Y|X] could also know if X had all that information.

    (2) The Wright Brothers and Conditioning.

    Let’s move away from voting to something closer to startup success; let’s consider the Wright Brothers and their efforts at having the world’s first controlled, powered flight.

    So, what where their chances?  Well, given that they were just another effort of the long list of efforts, including Langley who had recently fallen into the Potomac River, the chances were poor.

    But the Wright Brothers knew better!  How?  For the part about ‘control’, they clearly understood that this was a severe challenge and, for a solution, had worked out three axis control.  They were fairly sure that they could ‘control’ the airplane.  For the ‘powered’ part, they had worked out fairly carefully the drag of their aerodynamics (yes, they missed a point about Reynolds number) from their wind tunnel (itself a nice step forward), but basically they knew how much propeller thrust they needed.  Then they also knew how much horsepower they needed for that much propeller thrust.

    So, they ‘knew'; that is, they had much more information than that they were just another ‘effort’ on the old list of 100% failures.

    (3) Startup Success.

    Similarly for startups:  What is crucial is that the founder(s) ‘see their way clear’ to success, and here the information they use might, like for the Wright Brothers, be not available in the statistics about startups so that, really, those statistics are irrelevant for them.

    But, wait; there’s more!

    For a venture funded startup, the goal is a ‘big win’, e.g., an ‘exit’ of at least $50 million since less than that the financial arithmetic doesn’t work out for the venture firm.  Really, for a Series A investment of a few million dollars, the venture firm wants to shoot for an exit above $500 million, and another Google or Facebook would be very welcome.

    Okay, how to know?  Well, we, both the entrepreneurs and the venture partners, are at the beginning looking for something rare and exceptionally good.  Intuitively we believe, and correctly, that we won’t get much insight on how to have such success looking at data on efforts that included few or no such successes.  E.g., we’re not going to get much insight on how to win the NBA finals by looking at statistics from junior high basketball!  So, that’s how not to know.

    Still, how to know?  Follow the Wright Brothers.  That is, ‘engineer’ the success.  Then study the engineering in detail.  The approach usually recommended is:

    (A) Problem.  Pick a problem, currently solved at best poorly, where a few customers are willing to pay a lot or many customers are ready to pay at least a little.

    (B) Solution.  Find a much better solution to this problem.  Have the solution ‘defensible’, that is, difficult to duplicate or equal.  I.e., in Buffett’s words, build a protective ‘moat’ around the business.

    So, to evaluate such a startup, look in detail at (1) and (2).

    ‘Traction’?  John Glenn didn’t use that!  He needed to know that the whole system would work from launch pad, to orbit, reentry, splash down, and back home.  Just ‘traction’, that is, just getting 50 feet off the launch pad was alone not at all promising.  For the rest he needed to know, ‘early traction’ was irrelevant, and he needed to consider the engineering.

    (4) Applying Statistics — Case I.

    Berman writes:

    “The results of the Startup Genome report, and any statistic for that matter, rarely apply to an individual sample.  This is the nature of the word statistic.  It is a summary of analysis on data, and if you pick one item from the data, randomly and independently, then on average the result will apply.  In other words — only if you perform the experiment again and again, each time re-choosing the individual sample, will our result, and any statistical result apply.  Why is this a feature?  Because statistics are meant to be able to describe and predict samples and populations, not individuals.”


    The main goal of the statistical work is to predict, including for one individual.  Indeed, if we could not usefully apply the work to an individual, then likely we shouldn’t bother reporting the results.

    The work CAN apply well to an individual IF the ‘information’ the statistics used is at least close to all the data the individual has.  If the individual has much more data, then, sure, the statistics need not apply to them.

    E.g., the card counting statistics in Black Jack DO apply — in a fair game, quite effectively, actually — to individual Black Jack players who want to use card counting.

    (5) Applying Statistics — Case II.

    Suppose we have statistics on startups with two founders and then with three founders.  Suppose the statistics say that on average three founders do much better.  Then should a team of two founders reading this statistical result rush out and get a third founder?

    That is a TOUGH issue to address.

    Should the Wright brothers have rushed out and found a third founder?  No.  Why not?  Because the Wright brothers had done their engineering and saw their way to success.

    If a founding team, even with just one founder, sees their way clear to success, then the statistics about three founders should be nearly irrelevant.

    Why?  Because the statistics did little or nothing with the additional information about “see way to success” so that a team that does so see has some crucial extra information.

  • Pingback: Startup Genome: discover the patterns of successful Internet startups | pariSoma innovation loft. coworking and tech events.()

  • http://www.dnaguide.com Alice Rathjen – DNA Guide

    Startup Genome fails to take into account that pivots are more tied to a funding environment – not the Founder’s DNA.  Pivots are funding induced.  The start-ups pivot in response to investors to take on funding or else have to pivot after they take on funding.   
    I wish the startup genome looked at what investor networks were tied to the pivots and whether the investor induced pivots were successful.   

  • http://giffconstable.com giffc

    It bugs me that “raised X more money” is touted as success when certain people are reporting on the conclusions. I understand why it’s there but hate to think that entrepreneurs might view it as a comparative metric for successs, when it really isn’t.

  • Mike

    Has Jason or Ron read Nassim Taleb’s Black Swan?  It definitely bolster’s Jason’s argument albeit from a trading and investment perspective.  We humans love to see patterns and ‘obvious’ conclusions where there are non.

    That being said, the Startup Genome Project may posses some valid conclusions.  But they seem to just reinforce conventional wisdom – multiple co-founders, pivot, etc.  What is the analysis of use for?

  • Al Pittampalli

    Glad to see a civil intelligent debate on this. I agree that survivorship bias seems present.

  • Steve

    Great project! Is this similar to the music genome project in
    terms of creating vectors and algorithms? If so, I assume the process or tool
    would allow start-ups to form and prosper based on success patterns that is validated
    by data? Also, how or will market demand factor in the equation?



  • http://twitter.com/WECREATENYC WECREATENYC

    We like to join the debate see what we have to say http://bit.ly/mjR8GR

  • Pingback: Women 2.0 – Founding Startups » What the Startup Genome Means to Female Founders: Very Little()

  • http://www.ahlgrenwebdevelopment.com Hampus Ahlgren

    Love what you just did – comparing macro level data to individual actors and the factors that really matter. Good clarification!

  • Pingback: The Week in Geek™ – June 23, 2011 - The Week in Geek™()

  • Pingback: Personally, it was Time for Me to Pivot | Time To Pivot()

Back to top
mobile desktop