Axiomatic Probability: Definition, Kolmogorov's Three Axioms

Axiomatic probability is a unifying probability theory. It sets down a set of axioms (rules) that apply to all of types of probability, including frequentist probability and classical probability. These rules, based on Kolmogorov’s Three Axioms, set starting points for mathematical probability.

Kolmogorov’s Three Axioms

The three axioms are:

For any event A, P(A) ≥ 0. In English, that’s “For any event A, the probability of A is greater or equal to 0”.
When S is the sample space of an experiment; i.e., the set of all possible outcomes, P(S) = 1. In English, that’s “The probability of any of the outcomes happening is one hundred percent”, or—paraphrasing— “anytime this experiment is performed, something happens”.
If A and B are mutually exclusive outcomes, P(A ∪ B ) = P(A) + P(B).
Here ∪ stands for ‘union’. We can read this by saying “If A and B are mutually exclusive outcomes, the probability of either A or B happening is the probability of A happening plus the probability of B happening”

Many important laws are derived from Kolmogorov’s three axioms. For example, the Law of Large Numbers can be deduced from the laws by logical reasoning (Tijms, 2004).

Just because these axioms are universal, doesn’t mean they provide all the answers. For example, any function that satisfies all three axioms is called a probability function. However, the axioms don’t tell you which function to choose; it merely states that the probability function you choose must satisfy the rules.

Fine (2014) goes so far as to say the axioms lack “essential content”. What these three axioms don’t do:

Tell us where and when to apply the rules,
Give us guidelines or procedures for calculating probabilities,
Any insights to the nature of random processes.

The Development of Axiomatic Probability

The oldest type of probability is classical probability; it is usually applied to easy-to-analyze situations like gambling games.

Let’s say a random experiment (such as the throw of a dice) results in a finite number, n, of equally likely outcomes. If m of those outcomes have a certain attribute, the probability of that attribute would be the fraction m/n. This is useful for analyzing dice throws and card picks, but is less applicable to more complicated situations of daily life.

Frequentist probability was, historically, the next type of probability to be developed. Frequentist statistics uses rigid frameworks, the type of frameworks that you learn in basic statistics, like p-values and confidence intervals.

For every stats problem, there’s data.
There’s a test for each set of data.
Every test has its own rigid rules.

Tests are based on the fact that every experiment can be repeated infinitely. Deviation from this set of rules is never allowed, and if you dare to deviate, your methods will be chided as statistically unsound.

Frequentist probability has more applicability than the classical model, but is still very limited.

References

Fine, T. (2014). Theories of Probability: An Examination of Foundations. Academic Press.
Morey, Edward. 3 Basic Definitions of Probability Theory
Retrieved from https://tinyurl.com/y49bcrsa on April 15, 2018
Myers, Daniel. CS 547 Lecture 6: Axioms of Probability
Retrieved from https://tinyurl.com/y3u922z5 on April 15, 2018
Tijms, H. (2004). Understanding Probability: Chance Rules in Everyday Life. Cambridge University Press.
Universitat Zurich, Axiomatic Probability
Retrieved from https://www.math.uzh.ch/index.php?file&key1=45741 on April 15, 2018