Watch the video or read the article below:
What is a Histogram?
Histograms are similar to bar charts; they are a way to display counts of data. A bar graph charts actual counts against categories; The height of the bar indicates the number of items in that category. A histogram displays the same categorical variables in “bins”.
When you create a histogram, you are creating a bar chart that shows how many data points are within a range (an interval), called a bin. Normally, you choose the range that best fits your data. There are no set rules about how many bins you can have, but the rule of thumb is 5-20 bins. Any more than 20 bins and your graph will be hard to read. Fewer than 5 bins and your graph will have little (if any) meaning. Most graphs you’ll create in elementary statistics will have about 5 to 7 bins.
Another rule of thumb for bins is that if a value falls into two bins, place it in the upper bin. For example, if you are making a histogram of ages and your bins include 40-42 and 42-44, a participant who is 42 years old should be placed in the 42-44 bin.
What does the height of a bar in a histogram represent?
Unlike a bar chart, the area of a bar in a histogram represents the frequency, not the height. The frequency is calculated by multiplying the width of the bin by the height. The height of a bar in a histogram indicates frequency (counts) only if the bin widths are evenly spaced. For example, if you are plotting magnitudes of earthquakes and your bins are 3-5, 5-7 and 7-9, each bin is spaced two numbers apart and so the height of the bar would equal the frequency. However, histograms don’t always have even bins. When a histogram has uneven bins, the height does not equal the frequency.
Make a histogram: Overview
A histogram is a way of graphing groups of numbers according to how often they appear. This article will show you how to make one by hand, but you’re much better off using technology, like making an Excel 2007 histogram. Choosing bins in statistics is usually a matter of an educated guess. When you make a histogram by hand, you’re stuck with your original bin settings. With Excel (or other software), you can change the bins after you’ve created the histogram, giving you the ability to play around with bin sizes until you have a chart you’re happy with. OK, enough of lecturing about technology. Sometimes you might have to make a histogram by hand, especially if you’re making a relative frequency histogram; Technology like TI-83s will only create regular frequency histograms. If you have to create a histogram by hand, here’s the easy way.
Make a Histogram: Steps
Sample question: Create a histogram for the following test scores: 99, 97, 94, 88, 84, 81, 80, 77, 71, 25.
Step 1: Draw and label your x and y axis. For this example, the x-axis would be labeled “score” and the y-axis would be labeled “relative frequency %.”
Step 2: Choose the number of bins (how to choose bin sizes in statistics) and label your graph. For this sample, question, groups of 10 (the x-axis values are the bins) are a good choice (it looks like you’ll have 5 bars of one or two items in the group).
Step 3: Divide 100 by the number of data points to get an idea of where to place your frequency “ticks”. We have 10 items in our data set, so it makes sense to count by 100/10 = 10% (one item would equal 10% of the total).
Step 4: Count how many items are in each bin and then sketch a rectangle on the graph that corresponds to the percentage of the total that bin fills. In this sample data set, the first bin (20-30) has 1 item, 70-80 has two items. If an item falls on a bin boundary (i.e. 80), place it in the next bin up (80 would go in 80-90).
Tip: If you’re unsure about how many bins to choose, consider making a rough chart online. Play around with the bins (change the interval size) until you get a chart that you like the look of.
Tip: Choosing where to place the frequency ticks is also somewhat of a judgment call and is rarely an exact science. For example, if you had 21 items, you could place your ticks at 5% although each item would be slightly less than 5%. Bear that in mind when you sketch the graph.
Warning: Choosing optimal bin sizes gets very complex with large data sets (see this article for an example of the ugly math). The larger your data set, the better off you are using technology.
A bihistogram is a graph made from two histograms (“bi” = two) in opposite directions. One histogram is above the axis and one is below it. The histograms can be back-to-back on opposite sides of either the y-axis or the x-axis. Each half represents a different category.
- Location (Where on the horizontal axis the graph is centered).
- Scale (if the graph is stretched or squeezed. See: scale parameter.)
Creating a Bihistogram
The bihistogram is rarely used compared to other statistical techniques, so most popular software doesn’t have the capability of creating one. Two programs that do have options are R and Dataplot. SPSS Also provides a means to lay histograms side by side, which effectively gives you a the same thing.
Bihistogram in R
There doesn’t seem to be a simple function in R for creating bihistograms, but StrictlyStat suggests overlaying two histograms on top of each other, for the same effect. For the code using either ggplot or base graphics, see this article the R-Bloggers site. You can also find an online calculator (that uses an R module) here at Wessa.net. I did try out the online calculator; be patient, as it can take several minutes for the graph to appear.
The command in Dataplot is BIHISTOGRAM
Bihistogram in SPSS 20
- Click “Graphics” from the menu, then click “Chart Builder.”
- Choose histogram from the gallery at the bottom left. Icons will appear with a series of different histograms.
- Click and drag the back to back icon (the fourth from the left) into the chart preview area.
- Select your variables as you would any other chart. (If you’re unfamiliar with choosing variables, check out How to Make a Pie Chart in SPSS).
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.