Interpreting logistic regression – Part II

Part I dealt with logit models with dichotomous (0/1) predictors. These are handy models, but usually we deal with predictors that are continuous. That’s the point of Part II, where I walk through making sense of these models, focusing especially on calculating and plotting marginal effects and predicted probabilities.

You can find a link to Part II on my resources page, or here: http://bit.ly/2rZ0dqD

Happy Memorial Day weekend, and thanks to all who serve and have served!

Logistic regression – Pesky equations!

So the original post was a good lesson in verifying that MathJax works on my WordPress site before hitting the ‘Publish’ button. Sorry for my workflow hiccup.

To get the equations to render correctly, I’ve moved the post over to my Resources page, which you can access here.

I’ve been kicking around the idea of moving over to blogdown—might have to move that timeframe up a bit!

Interpreting logistic regression — Part I

 

I’m a big fan of logit models. Mostly because, in my opinion, they offer the clearest way to answer questions that are usually the most interesting to academics and entrepreneurs. Because we can turn the results of a logit model into a set of predicted probabilities, they let us answer questions like, “what’s the probability of a new business making a first sale given these conditions?” and “what’s the probability of an entrepreneur starting a new business given that he/she started a business before?”

The kicker, as I’ve found teaching logit models, is that while the final outcome is very helpful, getting there isn’t so easy. The path to probability is also a reason why logit and other limited dependent variable models are often misinterpreted in strategy and entrepreneurship research.

So this post walks through a logit model with two predictor variables, \(x1\) and \(x2\), that are both dichotomous (0/1). In a future Part II, I’ll do a logit model where \(x1\) and \(x2\) are both continuous variables.

Generating some data

We’ll start by generating the data for Model 1, with two dichotomous predictors

library(tidyverse)  # Love the tidyverse!

# Set the random seed generator to ensure reproducible results
set.seed(080203)

# Create a new dataframe with 1,000 observations and our two predictors 
model1.df <- data_frame(x1 = rbinom(1000, 1, .5),
                        x2 = rbinom(1000, 1, .5))

# Next we'll create our predicted value of y, and include a constant
model1.df <- model1.df %>%
  mutate(y = (.75*x1 + 1.5*x2)) %>%
  mutate(pr_y = exp(y)/(1 + exp(y))) %>%
  mutate(y = rbinom(1000, 1, pr_y)) %>%
  select(-pr_y)

# Export the data file to a csv for data sharing
write.csv(model1.df, file = "logit_sim_data.csv", row.names = FALSE)

# Now lets take a look
head(model1.df, 10)

Estimate our model

A logistic regression model is simply a logistic transformation of the linear model, so we can write our model in a familiar form…

\(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}+{\beta}{x_2}+\epsilon\)

Our outcome variable is the probability of \(y\) occurring (y = 1), which we write as \(\left[\frac{p}{1-p}\right]\), the probability of \(y\) occurring divided by the probability of \(y\) not occurring.

We’ll use the glm function to estimate our model in R, and specify the binomial link function, which gives us our logistic transformation.

model1 <- glm(y ~ x1 + x2, data = model1.df, family = "binomial")
summary(model1)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = model1.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1750  -1.2270   0.6248   0.8443   1.1286  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.1160     0.1197   0.969    0.333    
## x1            0.7321     0.1566   4.675 2.94e-06 ***
## x2            1.4185     0.1604   8.846  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1137.7  on 999  degrees of freedom
## Residual deviance: 1034.1  on 997  degrees of freedom
## AIC: 1040.1
## 
## Number of Fisher Scoring iterations: 4

Making sense of the model

Here is where things get complicated.

Lets take a look at the estimated coefficients…

log.odds <- summary(model1)$coefficients[1:3,1]
log.odds
## (Intercept)          x1          x2 
##   0.1159635   0.7320719   1.4185449

Our \(y\) variable is on the logit scale. That means that the coefficient estimates are the log odds of \(y\) occurring given a unit change in the two predictors. The intercept in this case is the log odds of \(y\) occurring when both \(x1\) and \(x2\) are zero.

Log odds are not particularly helpful, so we can convert them from the logit scale by taking the exponent:

exp.odds <- exp(log.odds)
exp.odds
## (Intercept)          x1          x2 
##    1.122955    2.079384    4.131105

The exponented coefficients are not the “effect” of \(x\) on \(y\) as would be the common interpretation of a regular linear regression. They also aren’t the likelihood of \(y\) occurring, which is really a statement of probability (we’ll get to that). Because \(x1\) is dichotomous, we interpret \(x1\) as the expected change, on average, in the odds of \(y\) occurring as \(x1\) shifts from 0 to 1, holding constant \(x2\).

Also because \(x1\) is dichotomous, the coefficient estimate is equivalent to the marginal effect of \(x1\) on \(y\), holding constant \(x2\) (more on that later).

So what we can say is that the odds of \(y\) occurring are just over two times higher in the presence of \(x1\) (\(x1\) = 1) then when \(x1\) is not present (\(x1\) = 0), holding \(x2\) constant. To put it another way, the odds of \(y\) occurring given the presence of \(x1\) and holding \(x2\) constant are roughly 2.08 to 1 (2.08:1).

Predicted probabilities

Odds are helpful, but they really are useful when it comes to predicting the probability that \(y\) occurs at discrete values of the two predictors.

In our model, given that both \(x1\) and \(x2\) are dichotomous, we have four different possible probabilities of \(y\) predicted by our model, and I’ve shown each of them with the corresponding equation:

Probability x1 x2 Equation
Pr_y 0 0 \(ln\left[\frac{p}{1-p}\right]=\alpha\)
Pr_y 1 0 \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}\)
Pr_y 0 1 \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_2}\)
Pr_y 1 1 \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}+{\beta}{x_2}\)

Yes, there is a shortcut to doing this in R, but it’s helpful to walk through how to do this by hand. Lets calculate the second row in the table—the probability that \(y\) occurs when \(x1\) occurs (\(x1\) = 0) and \(x2\) does not occur (\(x2\) = 0). Recall that the equation for this model is \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}\), or equivalently: \(Pr\_ y=\frac{1}{1+{e}^{-(\alpha+{\beta}{x_{1}})}}\) (see Wooldridge, 2012).

model1.intercept <- (summary(model1)$coefficients[1,1])
model1.betax1 <- (summary(model1)$coefficients[2,1])
pr_y = 1 / (1 + (exp(-1 * (model1.intercept + model1.betax1))))
pr_y
## [1] 0.7001548

On average and for this sample, there is a 70% probability of \(y\) occurring given that \(x1\) occurred and \(x2\) did not.

Ok, now for the shortcut. We can use R’s predict function to pass all of the possible combinations of from our table and give us the predicted probabilities of each. Note that we must use the ‘response’ option to return the predicted probabilities, as opposed to the log odds.

# We start with a new dataframe that contains our possible...
# ...values of x1 and x2, following our table above
model1.values <- data_frame(x1 = c(0, 1, 0, 1),
                            x2 = c(0, 0, 1, 1))
model1.values

We then pass the new dataframe to the predict function, and we’ll set the se.fit = TRUE option to return the standard errors for the prediction, so that we can construct confidence intervals.

model1.predict <- predict(model1, newdata = model1.values, type = "response", se.fit = TRUE)

# Create a new dataframe to store predictions and calculated confidence...
# ...intervals using +/- 1.96 for a 95% interval
model1.predict.df <- data_frame(pr_y = model1.predict$fit,
                                lower.ci = model1.predict$fit - (1.96 * model1.predict$se.fit),
                                upper.ci = model1.predict$fit + (1.96 * model1.predict$se.fit))

# Lets also convert these values to percentages to make them...
# ...easier to manage and interpret
model1.predict.df <- model1.predict.df %>%
  mutate(pr_y = 100 * round(pr_y, 4),
         lower.ci = 100 * round(lower.ci, 4),
         upper.ci = 100 * round(upper.ci, 4))

Now we can bind our predicted values dataframe to our model values dataframe (the one with the x1 and x2 values) to make a nice output table:

library(knitr)
model1.outcome <- bind_cols(model1.values, model1.predict.df)
kable(model1.outcome, caption = "Predicted probabilites of y at possible combinations of x1 and x2")
x1 x2 pr_y lower.ci upper.ci
0 0 52.90 47.05 58.74
1 0 70.02 64.75 75.28
0 1 82.27 78.34 86.19
1 1 90.61 87.86 93.35

While I’m a big fan of visualizations, there’s not much to plot with these two variables—all we get are point estimates. We’ll tackle plotting predicted probabilities in Logit Part II with continuous variables.