Probability Distributions > Open Ended Distribution

## What is an Open Ended Distribution?

Watch the video or read the article below:

An open ended distribution means that one or more of your classes (or bins) is open-ended. In other words, it doesn’t have a boundary. In this frequency distribution table, a height of 57″ or less (in the first row) means this is an open ended distribution.

This frequency histogram also shows an open ended distribution. The upper bin shows book prices of “31 and up.”

The opposite would be a closed ended distribution. In the following table, the boundaries clearly start at 118 and end at 157. For whatever reason, the researcher has no interest in numbers below or above those points.

## Why are open ended distributions necessary?

Open ended distributions are usually a matter of choice. It depends on the type of research you are doing and what you want to find out from your data. For example, let’s say you are making a frequency distribution table of family size. You poll 100 families and get the following data:

- One child: 28 families.
- Two children: 33 families.
- Three children: 28 families.
- Four children: 6 families.
- Six children: 2 families.
- Nine children: 1 families.
- Ten children: 1 families.
- Twenty children: 1 families.

You could summarize this data in a table like this:

Number of children. | Number of families. |
---|---|

1 | 28 |

2 | 33 |

3 | 28 |

4 | 6 |

6 | 2 |

9 | 1 |

10 | 1 |

20 | 1 |

But as you can probably guess, if you did a larger poll, you could end up with dozens of categories. In fact, one lucky(?) couple had 69 children. In most cases you don’t really care about exact family size. You might be comparing the socio-economic status of smaller families (two children and under) with larger families (three or more). Then it makes sense to report the data as an open ended distribution. The modified frequency distribution table is open ended at “more than four.”

You could summarize this data in a table like this:

Number of children. | Number of families. |
---|---|

1 | 28 |

2 | 33 |

3 | 28 |

more than 4 | 11 |

## Avoiding Open-Ended Classes

Sometimes using open-ended classes is unavoidable, but they can cause problems with calculations and interpretation. For example, if I have two classes:

- < 100
- > 100

Both classes could have entries in the thousands (either negative or positive), or values that approach infinity. You also run the possibility of your classes being very imbalanced if the bulk of your data falls into an open-ended class.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!

thank you i learned a lot

Please discuss the close-ended distribution. It would be nice to have a comparison.

Thanks for your comment. I added a little comparison.

How do we find the mean of open ended distributions?

Well, the median is preferable over the mean for open ended distributions. If you absolutely *must* find the mean, just calculate what your values would be (i.e. expand the open ended part) and find the mean the usual way. For example, if you have # of children, 0,1,2,3,4, and >5, then the values would be 0,1,2,3,4,5,6,7,8,9,10,11,12 etc. (I’d probably stop at 20). Then find the mean the usual way (you’ll probably want a weighted mean).

Discuss briefly the reasons why open-ended class intervals should be avoided

See the last section :)