Watch the video or read the steps below:
Choose Bin size: Overview
There isn’t a formula to choose bin sizes in statistics (what is a bin?). However, there are a few general rules:
- Bins should be all the same size. For example, groups of ten or a hundred.
- Bins should include all of the data, even outliers. If your outliers fall way outside of your other data, consider lumping them in with your first or last bin. This creates a rough histogram — make sure you note where outliers are being included.
- Boundaries for bins should land at whole numbers whenever possible (this makes the chart easier to read).
- Choose between 5 and 20 bins. The larger the data set, the more likely you’ll want a large number of bins. For example, a set of 12 data pieces might warrant 5 bins but a set of 1000 numbers will probably be more useful with 20 bins. The exact number of bins is usually a judgment call.
- If at all possible, try to make your data set evenly divisible by the number of bins. For example, if you have 10 pieces of data, work with 5 bins instead of 6 or 7.
Choose Bin size: Steps
Step 1: Find the smallest and largest data point. If your smallest and/or largest numbers are not whole numbers, go to Step 2. If they are whole numbers, go to Step 3.
Step 2: Lower the minimum a little and raise the maximum a little. For example, 1.2 as a minimum becomes 1, and 99.9 as a maximum becomes 100.
Step 3: Decide how many bins you need using your best guess and using the guidelines listed in the intro paragraph above.
Step 4: Divide your range (the numbers in your data set) by the bin size you chose in Step 3. For example, if you have numbers that range from 0 to 50, and you chose 5 bins, your bin size is 50/5=10.
Step 5: Create the bin boundaries by starting with your smallest number (from Steps 1 and 2) and adding the bin size from Step 4. For example, if your smallest number is 0 and your bin size is 10 you would have bin boundaries of 0, 10, 20…
Tip: If you have a large data set, you may want to use Excel to find the smallest and largest point. Type your data into a single column and then use the “Sort” function or type =MIN(A:A) in a blank cell in a different column (i.e. column B) and then type =MAX(A:A) to get the biggest number.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!