Statistical noise is the random irregularity we find in any real life data. They have no pattern. One minute your readings might be too small. The next they might be too large. These errors are usually unavoidable and unpredictable.
Quantifying Statistical Noise
Statistical noise generally consists of errors and residuals:
- Errors might include measurement errors and sampling errors; the differences between the observed values we’ve actually measured and their ‘true values’. While most errors are unavoidable, systematic errors—can usually be avoided. They creep into your data when you make the same mistake over and over again. For example, let’s say you wanted to know something about the general health of the population, but only surveyed patients in doctors’ waiting rooms. That systematic error (polling sick people over and over again) will create a statistic that’s completely off the mark.
- The residual of observed data is the difference between your observed value (again, that data point you’ve measured) and the predicted value; not the ‘true value’ per say but the point in space your theory tells you the data point should lie on. In regression analysis, it’s the distance between your observed data point and the regression line.
The Significance of Noise
Recognizing and quantifying the amount of statistical noise in a data set is an important step in analysis; a step which will allow us to see immediately whether or not data shifts are significant or simply part of the static.
Statistical noise is often referenced by margins of error.
For instance, if polls tell us that candidate B has moved three percentage points up in public opinion, but the statistical noise (a.k.a. the margin of error) is 10 percentage points, we know that the change is not statistically significant.
Smith, S. The Scientist and Engineer’s Guide to Digital Signal Processing, Chapter 2
Retrieved from http://my.fit.edu/~vkepuska/ece3551/DSP-GUIDE/CH2.PDF on February 11, 2018
Goldacre, B. Unemployment is rising… (19 August 2011). The Guardian.
Retrieved from https://www.theguardian.com/commentisfree/2011/aug/19/bad-science-unemployment-statistical-noise on February 11, 2018
University of Granada SCI2S: Noisy Data in Data Mining
retrieved from http://sci2s.ugr.es/noisydata on February 14, 2018
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.