What is the Probit Model?A probit model (also called probit regression), is a way to perform regression for binary outcome variables. Binary outcome variables are dependent variables with two possibilities, like yes/no, positive test result/negative test result or single/not single. The word “probit” is a combination of the words probability and unit; the probit model estimates the probability a value will fall into one of the two possible binary (i.e. unit) outcomes.
Predicted values from a probit model are similar to Z-scores; A probit value of:
- -3 has around a .13% chance of success,
- 0 has a 50% chance of success,
- 1 has around a 84% chance of success,
- 3 has around a 99% chance of success.
Examples of when you might use a probit model:
- You want to know if a particular candidate will win an election. The response variable is either 0 = win or 1 = lose.
- You want to know how variables like prestige of a certain law school and undergraduate GPA affect whether a job candidate will be hired. The response variable, hire/don’t hire, is a binary variable.
Other, similar, methods you might want to consider instead of a probit model:
- Logistic regression (logit): this gives practically identical results to probit regression. Which one you choose to run is a mostly a matter of personal choice. You may need to use the probit for a narrow selection of models. For example, if you are working with multiple-equation systems involving qualitative dependent variables the logit is a good choice because the model will fit better. Also, if you have nominal dependent variables with three categories or more, the logit is easier to calculate.
- Hotelling’s T2: It’s possible to run Hotelling’s T2 on binary outcome variables with a couple of data changes: the binary outcome variable becomes the grouping variable, and the predictor variables become outcome variables. Although you can run the test this way, a couple of major problems arise: you won’t get coefficients for each individual variable, and it the impact of predictor variables on other predictor variables isn’t clear.
- Ordinary Least Squares (OLS) Regression: called a linear probability model when used with binary outcome variables. However, there are many issues with this model, including that a linear probability models violates several assumptions of OLS regression (like normality of errors). Therefore, it’s not recommended that you run OLS with binary outcome variables as any results from hypothesis tests will be invalid (Long, 1997).
- Two-group discriminant function analysis: this is a multivariate method for binary outcome variables.
Long, J. Scott (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.