Statistics Definitions > What is a Normal Probability Plot?
What is a Normal Probability Plot?
When you have a set of data that you think might have a normal distribution (i.e. a bell curve), a graph of your data can help you decide whether or not your data is normal. Making a histogram of your data can help you decide whether or not a set of data is normal, but there is a more specialized type of plot you can create, called a normal probability plot. A normal probability plot graphs z-scores (normal scores) against your data set.
A straight, diagonal line means that you have normally distributed data. If the line is skewed to the left or right, it means that you do not have normally distributed data.
What is a Normal Probability Plot used for?
It can be easy to see with a histogram how data fits the norm, or skews from the norm.
With a normal probability plot, it can be easier to see individual data items that don’t quite fit a normal distribution. In the image below, the upper right data item is clearly out of line with the rest of the data, meaning that it doesn’t fit with a normal distribution.
How to Draw a Normal Probability Plot
Note: you may want to watch the Excel video below as it explains many of these steps in more detail:
- Arrange your x-values in ascending order.
- Calculate fi = (i-0.375)/(n+0.25), where i is the position of the data value in the
ordered list and n is the number of observations.
- Find the z-score for each fi
- Plot your x-values on the horizontal axis and the corresponding z-score
on the vertical axis.
Normal probability plots aren’t normally drawn by hand, because the normal scores used for the plot can’t be looked up in a table. That’s why technology like Minitab or SPSS is usually used to create these types of graphs. You can also use Excel to create a simple normal probability plot:
Note: It’s best to make a histogram of your data to make sure it’s normally distributed before you make a normal probability plot. That’s because it’s easier to see a bell curve on a histogram that it is to gauge whether or not your data is normally distributed on a straight line (or almost straight line).
Reference: University of Northern Colorado.