What is Ward’s Method?
Ward’s method (a.k.a. Minimum variance method or Ward’s Minimum Variance Clustering Method) is an alternative to single-link clustering. Popular in fields like linguistics, it’s liked because it usually creates compact, even-sized clusters (Szmrecsanyi, 2012).
Like most other clustering methods, Ward’s method is computationally intensive. However, Ward’s has significantly fewer computations than other methods. The drawback is this usually results in less than optimal clusters. That said, the resulting clusters are usually good enough for most purposes.
Sum of Squares Index, E
Like other clustering methods, Ward’s method starts with n clusters, each containing a single object. These n clusters are combined to make one cluster containing all objects. At each step, the process makes a new cluster that minimizes variance, measured by an index called E (also called the sum of squares index).
At each step, the following calculations are made to find E:
- Find the mean of each cluster.
- Calculate the distance between each object in a particular cluster, and that cluster’s mean.
- Square the differences from Step 2.
- Sum (add up) the squared values from Step 3.
- Add up all the sums of squares from Step 4.
In order to select a new cluster at each step, every possible combination of clusters must be considered. This entire cumbersome procedure makes it practically impossible to perform by hand, making a computer a necessity for most data sets containing more than a handful of data points. That said, Charles Romesburg’s Cluster Analysis for Researchers includes a very comprehensive and easy-to-follow example for calculating E by hand on a small set of data (starting on page 130).
Ward’s method is available to run in many popular programs including SPSS, SYSTAT and S-PLUS.
- Click “Analyze>classify>Hierarchical Clustering.
- Click “Method”
- Choose “Ward’s method” from the “Cluster Method” drop down menu.
- Romesburg, C. (2004. Cluster Analysis for Researchers. Lulu.com.
- Szmrecsanyi, B. (2012). Grammatical Variation in British English Dialects: A Study in Corpus-Based Dialectometry. Cambridge University Press.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.