Start with a simple model

As a field, we seem to be gravitating to a modus operandi where model complexity equates to making a theoretical contribution. The notion seems to be that the more variables we have, the more hypotheses we test in a single paper, and the more mediators and moderators we include, the greater the ‘contribution’ of the paper. Models end up with causal paths pointing in each cardinal direction, all under the assumption that the ‘more stuff’ I include equates to richness of understanding.

I think this is a bad trend, for three reasons.

1). The likelihood of model misspecification—The more variables, hypotheses, and complexity in the model, the more likely that the entire model will not just be wrong, but really, really, bad;

2). Creates perverse incentives—Tying a theoretical contribution to model complexity invites HARKing and p-hacking; and

3). Barriers to replication—The more a model depends on a particular set of variables from a particular dataset, each of which with its own complex construction, the harder it is for another researcher to replicate the study.

So what’s the answer? Start with a simple model…

  • A single x, predicting a single y;
  • Minimal measurement error for both variables;
  • A large, appropriately powered sample;
  • Appropriate steps to eliminate alternate explanations for the relationship between x and y (ideally by manipulating x); and
  • A reproducible codebook to ensure others can follow along with what you did.

Seriously, that’s it. Now, it’s actually really hard to do steps 2, 3, and 4. These are, however, critical to yield an unbiased estimate of the effect of x on y. Noisy measures in noisy data with small true effect sizes are far more likely to yield unpredictable (and usually inflated) results. A well developed measure, with measurement error kept to a minimum, needs a large dataset to tease out meaningful insights. Too often we see large datasets, but measures of such a convoluted construction that understanding just what the researcher did to build the measure—let alone have confidence that the observed effect is not simply an artifact of the measurement model—makes the contribution trivial at best. By the same token, well done measurement models tested in small, noisy samples result in a similar interpretational problem; it’s too difficult to separate the signal from the noise.

Step 4—dealing with endogeneity—is a topic near and dear to my heart. Here’s my specific problem…it’s so challenging to isolate a consistent effect size estimate for ONE focal relationship. The more hypotheses and variables added to the model, assuming the researcher tests it simultaneously, the difficulty in recovering consistent effect sizes increases exponentially; you are far more likely to screw the entire model up.

Of course, sharing your code, and ideally your data, is pretty easy. But it’s just not something commonly done in management and entrepreneurship research. I hope that is changing, and for me, all of my papers now include posted codebooks and data. There is just no good reason not too.

I think one solution is for journals to encourage more single hypothesis papers. Take an interesting question—say estimating the probability that a failed entrepreneur will start another new venture—and evaluate that question with 2-3 independent studies, with consistent measures, in large representative samples, and ideally with the same instruments used to address the endogeneity problem. As an incentive, journals could offer expedited review of these studies, assuming that the researcher shared his or her data and code.

The bottom line is that headline grabbing effect sizes with sexy variables in complicated models are, over the long run, far more likely to be found wanting than vindicated as possibly right. Science progresses with small, incremental contributions to our knowledge base. Start with a simple model, test it rigorously, and better our management science.

The Grand Theory of Entrepreneurship Fallacy

Periodically I have a conversation where the topic turns to entrepreneurship researchers inability to answer—with precision—why some ventures succeed, some fail, some become zombies, and some become unicorns. Similar conversations surround the topic of startup communities and clusters, and the role of research universities in supporting entrepreneurial ecosystems. Often someone bemoans that we have study after study that addresses only one small piece of the puzzle, or that one study may be contradictory to another study, or that a study is simply too esoteric to be useful.

My response is, well, that’s social science.

I am a social scientist, and proud to be one. I think across the social science domain, including management and entrepreneurship research, we have much to offer the students, businesses, governments, and other stakeholders we serve. But the one thing we aren’t particularly good at is humility. Humility in the sense that when we talk about our research and what we can offer, we’re aren’t always very good at acknowledging the limitations of our work.

Think about predicting the weather. The cool thing about the weather is that it’s governed by the laws of physics, and we know a lot about physics. But even with our knowledge, computational power, and millions of data points, there remains considerable uncertainty about predicting the weather over the next 24, 48, and 72 hours. Part of the reason is that interactions between variables in the environment are difficulty to account for, difficult to model, and especially difficult to predict. Meteorologists are exceptionally good forecasters, but are far from perfect. This is in a field where the fundamental relationships are governed by underlying law-like relationships.

The hard reality is that establishing unequivocal causal relationships in the social sciences is extremely hard, let alone forecasting specific cause and effect sizes. We don’t deal with law-like relationships, measuring latent phenomenon makes error always present, eliminating alternate explanations is maddeningly complex, and, well, we’re humans (that not-being-perfect-thing). Interactions among social forces and social phenomena are not only difficult to model, but in many ways are simply incomprehensible.

One technique we use as social scientists is to hold many factors that we cannot control and cannot observe as constant, and to build a much simpler model of a phenomenon than exists in reality. It helps us make sense of the world, but it comes at the cost of ignoring other factors that may be important, or even more important, than what we are trying to understand. It also means that our models are subjective—the answer provided by one model may not be the answer provided by another. In a sense, models are equally right and equally wrong.

Where stakeholders who are not social scientist get frustrated with us is the desire for simple, unequivocal answers. What is also troublesome is that some social scientists—despite knowing better—are more than happy to tell the stakeholder that “yes, I’ve got the answer, and this is it.” When that answer turns out not to work as advertised, the search begins again, although this time with the stakeholder even more frustrated then before.

Making the matter even more complicated are statistical tools and methodologies that seem to provide that unequivocal answer; the effect of x on y is z—when x changes by a given amount, expect y to change by z amount. It seems so simple, so believable, that it’s easy to be fooled into thinking that the numbers produced by a statistics package represent truth, when the reality of that number is, well, far from ‘truth’.

In conversations which turn to wanting simple, unequivocal answers about entrepreneurship—what I call the grand theory of entrepreneurship fallacy—telling the weather analogy helps. But it’s also easy to say that there simply aren’t simple answers. I can’t answer the question because there isn’t an answer; you are trying to solve an unsolvable problem. The best that I can provide, and the best that entrepreneurship data science can provide, is an educated guess. That guess will have a credibility interval around it, and will be narrowly applicable, and be subject to update as new data comes in and new relationships between variables emerge. That’s the best we can do, and be extremely wary of the researcher who says he or she can do better!

We characterize our human experience with uncertainty and with variance. Don’t expect anything better from data science on that human experience.

Credibility in strategic management research

Don Bergh1 and colleagues published a great note in Strategic Organization recently on the question of reproducibility of results in strategy research. I agree with virtually everything in the paper, but this passage on page 8 caught my attention…

Overall, based on our sample of 88 SMJ articles, the strategic management literature appears vulnerable to credibility problems for two main reasons. One, the majority of the articles did not report their data sufficiently to permit reproduction, leaving us in the dark with regards to the accuracy of their reported results. Two, among those articles where reproduction analyses were possible, a significant number of discrepancies existed between reported and reproduced significance levels.

I’ve written about this before—what limits our impact on management practice is a lack of rigor, and not an excess of it. Here is another example of the problem. When a second scholar is not able to reproduce the results of a study, using the same data (correlation matrix) and same estimator, that’s a significant concern. We simply cannot say with confidence, especially given threats to causal inference, that a single reported study has the strength of effect reported if data, code, and other related disclosures about research design and methodology are absent. Rigor and transparency, to me, will be the keys to unlocking the potential impact on management practice from strategy and entrepreneurship research.

On a related note, it’s nice in this paper that the authors drew the distinction between reproducibility and replication, which sometimes gets confused. A reproduction of a study is the ability to generate the same results from a secondary analysis as reported in the original study, using the same data. A replication is the ability to draw similar nomological conclusions—generally with overlapping confidence intervals of the estimates—of a study using the same research design and methodology but on a different random sample.

Both reproducibility and replication are critical to building confidence and credibility in scientific findings. To me though, reproducibility is a necessary, but not sufficient condition for credibility. The easiest way to ensure reproducibility is to share data and to share code, and to do this early in the review process. For example, the Open Science Framework allows authors to make use of an anonymized data and file repository, allowing reviewers to check data and code without violating blind review.

While yes, many estimators (OLS, ML, covariance-based SEM) allow you to reproduce results based on a correlation/covariance matrix, as reported in the paper, this can be a tall order, what with the garden of forking paths problem. More problematic for strategy research is the use of panel/multilevel data, which was an area the authors didn’t touch on. In this case, a multilevel study’s reported correlation matrix would pool the lower- and higher-order variance together, effectively eliminating the panel structure. You could reproduce a naive, pooled model from the published correlation matrix, but not the multilevel model, which demonstrably limits its usefulness. This is a major a reason why I’m in favor of dropping the standard convention of reporting a correlation matrix and instead requiring data and code.

Regardless though, lack of reproducibility is a significant problem in strategy, as in other disciplines. We’ve got a lot more work to do to build confidence in our results, and to have the impact on management practice that we could.

  1. In the interest of full disclosure, Dr. Bergh was a mentor of mine at University of Denver—I was a big fan of him then, and I still am 🙂 ↩

Bad statistics and theoretical looseness

I’m actually a big fan of theory—I’m just not wild about the ways in which we (management and entrepreneurship scholars) test it. The driving reason is theoretical looseness; the ability to offer any number of theoretical explanations for a phenomenon of interest.

What concerns me most with theoretical looseness is that researchers often become blind to questioning results that don’t align with the preponderance of evidence in the literature. The race for publication, combined with the ability to offer what is a logically consistent—even if contradictory to most published research—explanation makes it all to easy to slip studies with flimsy results into the conversation.

In EO research, we see this often with studies purporting to find a nill, or a negative, effect of entrepreneurial behavior and firm growth. Is it possible? Sure. A good Bayesian will always allow for a non-zero prior, however small it might be. But is is logical? Well, therein lies the problem. Because our theories are generally broad, or because we can pull from a plethora of possible theoretical explanations that rarely provide specific estimates of causal effects and magnitudes, it is easy to take a contradictory result and offer an argument about why being entrepreneurial results in a firm’s growth decreasing.

The problem is, researchers often don’t take the extra steps to evaluate the efficacy of the model he or she estimated. Even checking basics like distributional assumptions and outliers are foregone in the race to write up the results and send it out for review. As estimators have become easier to use thanks to point and click software and macros, it’s even easier for researchers to throw data into the black box, get three asterisks, and then find some theoretical rationale to explain seemingly inconsistent results. It’s just too easy for bad statistics but easy theorizing to get published.

The answer, as others have noted, is to slow the process down. Here I think pre-prints are particularly valuable, and one reason why I’ll be starting to use them myself. Ideas and results need time to percolate—to be looked at and to be challenged by the community. Once a paper is published it is simply too hard to ‘correct’ the record from one-off studies that, tragically, can become influential simply because they are ‘interesting’. In short, take the time to get it right, and avoid the temptation to pull a theoretical rabbit out of the hat when the results don’t align with the majority of the conversation.