Regression Analysis > Parsimonious Model

## What is a Parsimonious Model?

Parsimonious models are simple models with great explanatory predictive power. They explain data with a minimum number of parameters, or predictor variables.

The idea behind parsimonious models stems from Occam’s razor, or “the law of briefness” (sometimes called *lex parsimoniae* in Latin). The law states that you should use no more “things” than necessary; In the case of parsimonious models, those “things” are parameters. **Parsimonious models have optimal parsimony, or just the right amount of predictors needed to explain the model well.**

## Comparing Models

There is generally a tradeoff between goodness of fit and parsimony: low parsimony models (i.e. models with many parameters) tend to have a better fit than high parsimony models. This is not usually a good thing; adding more parameters usually results in a good model fit for the data at hand, but that same model will likely be useless for predicting other data sets.Finding the right balance between parsimony and goodness of fit can be challenging. Popular methods include Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), Bayes Factors and Minimum Description Length.

**Akaike’s Information Criterion**compares the quality of a set of models; The AIC will rank each option from best to worst. The most parsimonious model will be the one that neither under-fits nor over-fits. One downside is that the AIC says nothing about quality; If you input a series of poor models, the AIC will choose the best from that poor-quality set.- The
**Bayesian Information Criterion (BIC)**is almost the same as the AIC, although it tends to favor models with fewer parameters. The BIC is also known as the**Schwarz information criterion**or Schwarz’s BIC. **Bayes Factors**compare models using prior distributions. It is similar to the Likelihood Ratio Test, but models do not have to be nested. Model selection based on Bayes Factors can be approximately equal to BIC model selection. However, BIC doesn’t require knowledge of priors so it is often preferred.**Minimum Description Length**(MDL): commonly used in computer and information science, it works on the basis that strings of related data can be compressed, reducing the number of predictor variables.

**Need help with a homework or test question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.