## What is a Histogram?

Histograms are similar to bar charts; they are a way to display counts of data. A bar graph charts actual counts against categories; The height of the bar indicates the number of items in that category. A histogram displays the same categorical variables in “bins”.

When you create a histogram, you are creating a bar chart that shows how many data points are within a range (an interval), called a bin. Normally, you choose the range that best fits your data. There are **no set rules about how many bins** you can have, but the rule of thumb is 5-20 bins. Any more than 20 bins and your graph will be hard to read. Fewer than 5 bins and your graph will have little (if any) meaning. Most graphs you’ll create in elementary statistics will have about 5 to 7 bins.

Another **rule of thumb** for bins is that if a value falls into two bins, place it in the upper bin. For example, if you are making a histogram of ages and your bins include 40-42 and 42-44, a participant who is 42 years old should be placed in the 42-44 bin.

### What does the height of a bar in a histogram represent?

Unlike a bar chart, the *area* of a bar in a histogram represents the frequency, not the height. The frequency is calculated by multiplying the width of the bin by the height. The height of a bar in a histogram indicates frequency (counts) *only if the bin widths are evenly spaced.* For example, if you are plotting magnitudes of earthquakes and your bins are 3-5, 5-7 and 7-9, each bin is spaced two numbers apart and so the height of the bar would equal the frequency. However, histograms don’t always have even bins. When a histogram has uneven bins, the height does not equal the frequency.

## Make a histogram: Overview

A histogram is a way of graphing groups of numbers according to how often they appear. This article will show you how to make one by hand, but you’re *much* better off using technology, like making an Excel 2007 histogram. Choosing bins in statistics is usually a matter of an educated guess. When you make a histogram by hand, you’re stuck with your original bin settings. With Excel (or other software), you can change the bins *after* you’ve created the histogram, giving you the ability to play around with bin sizes until you have a chart you’re happy with. OK, enough of lecturing about technology. Sometimes you might *have* to make a histogram by hand, especially if you’re making a relative frequency histogram; Technology like TI-83s will only create regular frequency histograms. If you *have* to create a histogram by hand, here’s the easy way.

## Make a Histogram: Steps

**Sample question:** Create a histogram for the following test scores: 99, 97, 94, 88, 84, 81, 80, 77, 71, 25.

Step 1: **Draw and label your x and y axis**. For this example, the x-axis would be labeled “score” and the y-axis would be labeled “relative frequency %.”

Step 2: **Choose the number of bins** (how to choose bin sizes in statistics) and label your graph. For this sample, question, groups of 10 (the x-axis values are the bins) are a good choice (it looks like you’ll have 5 bars of one or two items in the group).

Step 3: **Divide 100 by the number of data points **to get an idea of where to place your frequency “ticks”. We have 10 items in our data set, so it makes sense to count by 100/10 = 10% (one item would equal 10% of the total).

Step 4: **Count how many items are in each bin and then sketch a rectangle on the graph **that corresponds to the percentage of the total that bin fills. In this sample data set, the first bin (20-30) has 1 item, 70-80 has two items. If an item falls on a bin boundary (i.e. 80), place it in the *next* bin up (80 would go in 80-90).

*That’s it!*

**Tip: ** If you’re unsure about how many bins to choose, consider making a rough chart online with this Shodor.org tool. Play around with the bins (change the interval size) until you get a chart that you like the look of.

**Tip**: Choosing where to place the frequency ticks is also somewhat of a judgment call and is rarely an exact science. For example, if you had 21 items, you could place your ticks at 5% although each item would be slightly less than 5%. Bear that in mind when you sketch the graph.

**Warning:** Choosing optimal bin sizes gets *very* complex with large data sets (see this article for an example of the ugly math). The larger your data set, the better off you are using technology.

## The Bihistogram

A bihistogram is a graph made from two histograms (“bi” = two) in opposite directions. One histogram is above the axis and one is below it. The histograms can be back-to-back on opposite sides of either the y-axis or the x-axis. Each half represents a different category.

The bihistogram is a visual alternative to an independent samples t-test. It can be more useful than the t-test because many features are visible on the same plot, including:

- Skewness.
- Location (Where on the horizontal axis the graph is centered).
- Scale (if the graph is stretched or squeezed. See: scale parameter.)
- Outliers.

## Creating a Bihistogram

The bihistogram is rarely used compared to other statistical techniques, so most popular software doesn’t have the capability of creating one. Two programs that do have options are R and Dataplot. SPSS Also provides a means to lay histograms side by side, which effectively gives you a the same thing.

### Bihistogram in R

There doesn’t seem to be a simple function in R for creating bihistograms, but *StrictlyStat* suggests overlaying two histograms on top of each other, for the same effect. For the code using either ggplot or base graphics, see this article the R-Bloggers site. You can also find an online calculator (that uses an R module) here at Wessa.net. I did try out the online calculator; be patient, as it can take several minutes for the graph to appear.

### Dataplot

The command in Dataplot is BIHISTOGRAM

### Bihistogram in SPSS 20

- Click “Graphics” from the menu, then click “Chart Builder.”
- Choose histogram from the gallery at the bottom left. Icons will appear with a series of different histograms.
- Click and drag the back to back icon (the fourth from the left) into the chart preview area.
- Select your variables as you would any other chart. (If you’re unfamiliar with choosing variables, check out How to Make a Pie Chart in SPSS).

