A Gaussian mixture model is a distribution assembled from weighted multivariate Gaussian* distributions. Weighting factors assign each distribution different levels of importance. The resulting model is a super-position (i.e. an overlapping) of bell-shaped curves.
Gaussian mixture models are semi-parametric. Parametric implies that the model comes from a known distribution (which is in this case, a set of normal distributions). It’s semi-parametric because more components, possibly from unknown distributions, can be added to the model.
GMMs are widely used for clustering and density estimation in physics. However they do have a wide range of applications in other fields like modeling weather observations in geoscience (Zi, 2011), certain autoregressive models, or noise from some time series.
If you think your data stems from a set of different normal distributions, then the GMM is an appropriate analysis tool. The normal distribution is an underlying assumption, which means that while it’s assumed the distributions are Gaussian, they may not be. In some cases, you may not be able to tell, but use logic or prior knowledge to assume your data has a normal distribution. Therefore, models created from a GMM methods carry with them a certain level of uncertainty. However, it’s easy to use (most popular software has the capability of producing GMMs) and — compared to non-parametric modeling — is relatively simple.
In most cases, you’ll be using software to create Gaussian mixture models. Clustering, K-means and ISODATA are based on the Gaussian mixture model.
*Note: In statistics, the Gaussian distribution is called the normal distribution or the normal curve. In the social sciences, it’s called the bell curve.
Li,Z. Applications of Gaussian Mixture Model to Weather Observations. IEEE Geoscience and Remote Sensing Letters ( Volume: 8, Issue: 6, Nov. 2011 )
McLachlan, G. & Peel, D. Finite Mixture Models.
Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. If you'd rather get 1:1 study help, Chegg Tutors offers 30 minutes of free tutoring to new users, so you can try them out before committing to a subscription.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.
Comments? Need to post a correction? Please post a comment on our Facebook page.