A

**Gaussian mixture model**is a distribution assembled from weighted multivariate Gaussian* distributions. Weighting factors assign each distribution different levels of importance. The resulting model is a super-position (i.e. an overlapping) of bell-shaped curves.

Gaussian mixture models are **semi-parametric**. *Parametric *implies that the model comes from a known distribution (which is in this case, a set of normal distributions). It’s *semi*-parametric because more components, possibly from unknown distributions, can be added to the model.

## Uses

GMMs are widely used for **clustering and density estimation in physics**. However they do have a wide range of applications in other fields like modeling weather observations in geoscience (Zi, 2011), certain autoregressive models, or noise from some time series.

**If you think your data stems from a set of different normal distributions, then the GMM is an appropriate analysis tool.** The normal distribution is an underlying **assumption**, which means that while it’s *assumed *the distributions are Gaussian, they may not be. In some cases, you may not be able to tell, but use logic or prior knowledge to assume your data has a normal distribution. Therefore, models created from a GMM methods carry with them a certain level of uncertainty. However, it’s easy to use (most popular software has the capability of producing GMMs) and — compared to non-parametric modeling — is relatively simple.

In most cases, you’ll be using software to create Gaussian mixture models. Clustering, K-means and ISODATA are based on the Gaussian mixture model.

The basic formula for a GMM with* m* components is:

***Note:** In statistics, the Gaussian distribution is called the normal distribution or the normal curve. In the social sciences, it’s called the bell curve.

**References:**

Li,Z. Applications of Gaussian Mixture Model to Weather Observations. IEEE Geoscience and Remote Sensing Letters ( Volume: 8, Issue: 6, Nov. 2011 )

McLachlan, G. & Peel, D. Finite Mixture Models.

**Need help with a homework or test question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.