Bayesian Information Criterion (BIC) / Schwarz Criterion

The Bayesian Information Criterion (BIC) is an index used in Bayesian statistics to choose between two or more alternative models.

The BIC is also known as the Schwarz information criterion (abrv. SIC) or the Schwarz-Bayesian information criteria. It was published in a 1978 paper by Gideon E. Schwarz, and is closely related to the Akaike information criterion (AIC) which was formally published in 1974.

Definition of the Bayesian Information Criterion / Schwarz Criterion

The Bayesian Information Criterion (BIC) is defined as

k log(n)- 2log(L(θ̂)).

Here n is the sample size; the number of observations or number of data points you are working with. k is the number of parameters which your model estimates, and θ is the set of all parameters.

L(θ̂) represents the likelihood of the model tested, given your data, when evaluated at maximum likelihood values of θ. You could call this the likelihood of the model given everything aligned to their most favorable.

Another way of understanding L(θ̂) is that it is the probability of obtaining the data which you have, supposing the model being tested was a given.

Comparing Models

Comparing models with the Bayesian information criterion simply involves calculating the BIC for each model. The model with the lowest BIC is considered the best, and can be written BIC^*(or SIC^* if you use that name and abbreviation).

We can also calculate the Δ BIC; the difference between a particular model and the ‘best’ model with the lowest BIC, and use it as an argument against the other model. Δ BIC is just BIC_model – BIC^*, where BIC* is the best model.

If Δ BIC is less than 2, it is considered ‘barely worth mentioning’ as an argument either for the best theory or against the alternate one. The edge it gives our best model is too small to be significant. But if Δ BIC is between 2 and 6, one can say the evidence against the other model is positive; i.e. we have a good argument in favor of our ‘best model’. If it’s between 6 and 10, the evidence for the best model and against the weaker model is strong. A Δ BIC of greater than ten means the evidence favoring our best model vs the alternate is very strong indeed.

Example

Suppose you have a set of data with 50 observation points, and Model 1 estimates 3 parameters. Model 2 estimates 4 parameters. Let’s say the log of your maximum likelihood for model 1 is a; and for model 2 it is 2a. Using the formula k log(n)- 2log(L(θ)):

Calculating SIC on this data gives us:

Model 1: 3log(50) – 2a = 5.1 – 2a
Model 2: 4log(50) – 4a = 6.8 – 4a

So ΔBIC is 1.7 – 2a.

Since the evidence that the Bayesian Information Criterion gives us for model 1 will only be ‘worth mentioning’ if 1.7 – 2a > 2, we can only claim conclusive results if -2a > 0.3; that is to say, a < -0.15.

References

Claeskins, G. & Hkort, N. (2008). Model Selection and Model Averaging (Cambridge Series in Statistical and Probabilistic Mathematics) 1st Edition. Cambridge University Press.
Fabozzi, Focardi, Rachev & Arshanapalli. The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications. Appendix E: Model Selection Criterion: AIC and BIC. Retrieved from http://onlinelibrary.wiley.com/store/10.1002/9781118856406.app5/asset/app5.pdf;jsessionid=A6726BA5AE1AD2A5AF007FFF78528249.f03t01?v=1&t=je8jr983&s=09eca6efc0573a238457d475d3ac909ec816a699 on March 1, 2018
Wasserman, Larry. STAT 705 Lecture Notes: Model Selection
Retrieved from http://www.stat.cmu.edu/~larry/=stat705/Lecture16.pdf on March 1, 2018