Log Odds: Definition and Worked Statistics Problems

Logistic Regression > Log Odds

Log odds play a central role in logistic regression. Every probability can be converted to log odds by finding the odds ratio and taking the logarithm. Despite the relatively simple conversion, log odds can be a little esoteric. Jaccard [1] calls them

“…counterintuitive and challenging to interpret,”

especially if you don’t have a strong statistical background. That said, the formulas are simple, even if the results are a little challenging to decipher.

Conversions: Probability to Odds to Log of Odds

Probability, odds ratios and log odds are all the same thing, just expressed in different ways. It’s similar to the idea of scientific notation: the number 1,000 can be written as 1.0*10³ or even 1*10*10*10. What works for one person, or one equation, might not work for another. In many cases, you can simply choose which format you want to use. Other times (for example, you’re publishing a paper or are using logistic regression), you might be forced to adopt a particular format.

Probability is the probability an event happens. For example, there might be an 80% probability of rain today.
Odds (more technically the odds of success) is defined as probability of success/probability of failure. So the odds of a success (80% chance of rain) has an accompanying odds of failure (20% chance it doesn’t rain); as an equation (the “odds ratio“), that’s .8/.2 = 4.
Log odds is the logarithm of the odds. Ln(4) = 1.38629436 ≅ 1.386.

Conversion to log odds results in symmetry around zero, which is easier for analysis. As an example, suppose we begin with a probability of success of 0.75. That gives us a probability of failure of 1 – 0.75 = 0.25:

The odds of success is defined as the ratio of successes over failures: 0.75 / 0. 25 = 3, which means that odds of success are 3:1. Odds range from 0 and positive infinity.
The log odds is the (natural) log transformation of odds. In this example, ln(3) = 1.098612.

The following table shows this result and many other common conversions.

p	odds	logodds
0.001	0.001001	-6.906755
0.01	0.010101	-4.59512
0.15	0.1764706	-1.734601
0.2	0.25	-1.386294
0.25	0.3333333	-1.098612
0.3	0.4285714	-0.8472978
0.35	0.5384616	-0.6190392
0.4	0.6666667	-0.4054651
0.45	0.8181818	-0.2006707
0.5	1	0
0.55	1.222222	0.2006707
0.6	1.5	0.4054651
0.65	1.857143	0.6190392
0.7	2.333333	0.8472978
0.75	3	1.098612
0.8	4	1.386294
0.85	5.666667	1.734601
0.9	9	2.197225
0.999	999	6.906755
0.9999	9999	9.21024

Table showing the relationship between probability, odds and log of odds.

Log Odds and the Logit Function

The odds ratio is the probability of success/probability of failure. As an equation, that’s

P(A)/P(-A),

where P(A) is the probability of A, and P(-A) the probability of ‘not A’ (i.e. the complement of A).

Taking the logarithm of the odds ratio gives us the log odds of A, which can be written as

log(A) = log(P(A)/P(-A)).

Since the probability of an event happening, P(-A) is equal to the probability of an event not happening, 1 – P(A), we can write the log odds as

log [p/(1-p)]

Where:

p = the probability of an event happening
1 – p = the probability of an event not happening

When a function’s variable represents a probability, p (as in the function above), it’s called the logit function .

Of course, we can’t graph a “p” on most calculators, so you may see the equation for the logit function written as

logit(x) = ln (x/1 – x)

where x is the probability of an event happening

Using Log Odds

There are many times when you might be required to use log odds, such as when publishing a paper or using logistic regression. Another reason to use log odds is that it is usually difficult to model variables with restricted ranges, such as probabilities. Using log odds gets around this problem.

Sometimes, we prefer using log odds instead of basic probability measures because they are easily updated with new data — a common scenario in Bayesian statistics.

For instance, suppose you have a 5% chance that a thief will come in at your door on any given night. You have a watch dog, but he’s not terribly reliable:

he’ll bark half the time if a thief comes, and j
he’ll bark just 1/4 of the time if the person walking by is an honest man.

Now imagine you hear footsteps, and the dog barks. Before your dog barked, the log odds of a thief were ln(.05/.95) = ln(1/19), or -2.9444. We call that your prior log odds for the thief. The likelihood ratio of a bark is just the probability of a bark with a thief (1/2) over the likelihood of a bark with no thief (1/4), and to find the log odds we take the log of that: ln((1/2)/(1/4)) = log(2) = 0.6931.

Now the posterior log odds of the thief—the log odds that there is a thief, given you’ve just heard the dog bark—is -2.9444 + 0.6931, or -2.2513. Since the ln (odds ratio) = log odds, e^{log odds} = odds ratio.

To turn our -2.2513 above into an odds ratio, we calculate e^-2.2513, which happens to be about 0.1053:1. So the probability we have a thief is 0.1053/1.1053 = 0.095, so 9.5 %. Notice how we converted the odds ratio to a probability by dividing the first part of the ratio with the sum of both parts (the total). Notice also that we came to our final answer without any involved calculations, assuming, of course, we have a calculator to help us with the logarithms. We’ve used natural logs here (base e); you can actually use logs in any base, you just need to be consistent.

Advantages and disadvantages of log odds

Conversion to log odds results in symmetry around zero, which makes for easier analysis [2]. In other words, log odds are the same regardless of which event is the “numerator” and which event is the “denominator” of the odds ratio. For example, the odds ratio of 4 to 1 is the same as the odds ratio of 1 to 4.

Even though log odds conversion is transformations are relatively straightforward, they can be difficult to interpret, especially without a solid statistical foundation [2]. This is because log odds are a measure of the odds ratio, which is a ratio of two probabilities. Probabilities are sometimes a challenge to understand, and the odds ratio compounds this difficulty.

Watch this video on YouTube

References

[1] Jaccard, J. (2001) Interaction Effects in Logistic Regression, Issue 135. SAGE.