Logistic Regression > Log Odds
Log odds play a central role in logistic regression. Every probability can be converted to log odds by finding the odds ratio and taking the logarithm. Despite the relatively simple conversion, log odds can be a little esoteric. Jaccard [1] calls them
“…counterintuitive and challenging to interpret,”
especially if you don’t have a strong statistical background. That said, the formulas are simple, even if the results are a little challenging to decipher.
Conversions: Probability to Odds to Log of Odds
Probability, odds ratios and log odds are all the same thing, just expressed in different ways. It’s similar to the idea of scientific notation: the number 1,000 can be written as 1.0*103 or even 1*10*10*10. What works for one person, or one equation, might not work for another. In many cases, you can simply choose which format you want to use. Other times (for example, you’re publishing a paper or are using logistic regression), you might be forced to adopt a particular format.
- Probability is the probability an event happens. For example, there might be an 80% probability of rain today.
- Odds (more technically the odds of success) is defined as probability of success/probability of failure. So the odds of a success (80% chance of rain) has an accompanying odds of failure (20% chance it doesn’t rain); as an equation (the “odds ratio“), that’s .8/.2 = 4.
- Log odds is the logarithm of the odds. Ln(4) = 1.38629436 ≅ 1.386.
Conversion to log odds results in symmetry around zero, which is easier for analysis. As an example, suppose we begin with a probability of success of 0.75. That gives us a probability of failure of 1 – 0.75 = 0.25:
- The odds of success is defined as the ratio of successes over failures: 0.75 / 0. 25 = 3, which means that odds of success are 3:1. Odds range from 0 and positive infinity.
- The log odds is the (natural) log transformation of odds. In this example, ln(3) = 1.098612.
The following table shows this result and many other common conversions.
p | odds | logodds |
---|---|---|
0.001 | 0.001001 | -6.906755 |
0.01 | 0.010101 | -4.59512 |
0.15 | 0.1764706 | -1.734601 |
0.2 | 0.25 | -1.386294 |
0.25 | 0.3333333 | -1.098612 |
0.3 | 0.4285714 | -0.8472978 |
0.35 | 0.5384616 | -0.6190392 |
0.4 | 0.6666667 | -0.4054651 |
0.45 | 0.8181818 | -0.2006707 |
0.5 | 1 | 0 |
0.55 | 1.222222 | 0.2006707 |
0.6 | 1.5 | 0.4054651 |
0.65 | 1.857143 | 0.6190392 |
0.7 | 2.333333 | 0.8472978 |
0.75 | 3 | 1.098612 |
0.8 | 4 | 1.386294 |
0.85 | 5.666667 | 1.734601 |
0.9 | 9 | 2.197225 |
0.999 | 999 | 6.906755 |
0.9999 | 9999 | 9.21024 |
Log Odds and the Logit Function
The odds ratio is the probability of success/probability of failure. As an equation, that’s
P(A)/P(-A),
where P(A) is the probability of A, and P(-A) the probability of ‘not A’ (i.e. the complement of A).
Taking the logarithm of the odds ratio gives us the log odds of A, which can be written as
log(A) = log(P(A)/P(-A)).
Since the probability of an event happening, P(-A) is equal to the probability of an event not happening, 1 – P(A), we can write the log odds as
Where:
- p = the probability of an event happening
- 1 – p = the probability of an event not happening
When a function’s variable represents a probability, p (as in the function above), it’s called the logit function .
Of course, we can’t graph a “p” on most calculators, so you may see the equation for the logit function written as
logit(x) = ln (x/1 – x)
where x is the probability of an event happening
Using Log Odds
There are many times when you might be required to use log odds, such as when publishing a paper or using logistic regression. Another reason to use log odds is that it is usually difficult to model variables with restricted ranges, such as probabilities. Using log odds gets around this problem.
Sometimes, we prefer using log odds instead of basic probability measures because they are easily updated with new data — a common scenario in Bayesian statistics.
For instance, suppose you have a 5% chance that a thief will come in at your door on any given night. You have a watch dog, but he’s not terribly reliable:
- he’ll bark half the time if a thief comes, and j
- he’ll bark just 1/4 of the time if the person walking by is an honest man.
Now imagine you hear footsteps, and the dog barks. Before your dog barked, the log odds of a thief were ln(.05/.95) = ln(1/19), or -2.9444. We call that your prior log odds for the thief. The likelihood ratio of a bark is just the probability of a bark with a thief (1/2) over the likelihood of a bark with no thief (1/4), and to find the log odds we take the log of that: ln((1/2)/(1/4)) = log(2) = 0.6931.
Now the posterior log odds of the thief—the log odds that there is a thief, given you’ve just heard the dog bark—is -2.9444 + 0.6931, or -2.2513. Since the ln (odds ratio) = log odds, elog odds = odds ratio.
To turn our -2.2513 above into an odds ratio, we calculate e-2.2513, which happens to be about 0.1053:1. So the probability we have a thief is 0.1053/1.1053 = 0.095, so 9.5 %. Notice how we converted the odds ratio to a probability by dividing the first part of the ratio with the sum of both parts (the total). Notice also that we came to our final answer without any involved calculations, assuming, of course, we have a calculator to help us with the logarithms. We’ve used natural logs here (base e); you can actually use logs in any base, you just need to be consistent.
Advantages and disadvantages of log odds
Conversion to log odds results in symmetry around zero, which makes for easier analysis [2]. In other words, log odds are the same regardless of which event is the “numerator” and which event is the “denominator” of the odds ratio. For example, the odds ratio of 4 to 1 is the same as the odds ratio of 1 to 4.
Even though log odds conversion is transformations are relatively straightforward, they can be difficult to interpret, especially without a solid statistical foundation [2]. This is because log odds are a measure of the odds ratio, which is a ratio of two probabilities. Probabilities are sometimes a challenge to understand, and the odds ratio compounds this difficulty.
References
[1] Jaccard, J. (2001) Interaction Effects in Logistic Regression, Issue 135. SAGE.