Cantor Distribution - Statistics How To

< List of probability distributions < Cantor distribution

The Cantor Distribution is a probability distribution whose cumulative distribution function (CDF) is the Cantor Function. The Cantor function is an example of a pathological function that is constant almost everywhere yet always climbs upwards, thus earning its nickname “the devil’s staircase.”

The Cantor function [1]. Despite its appearance, there are no jumps; the function is continuous for all real numbers.

This distribution has neither a probability density function (PDF) nor a probability mass function (PMF) because — although its CDF is continuous — it does not have absolute continuity with respect to Lebesgue measure and does not have any point masses. It is an example of a singular continuous distribution [2]. A singular continuous distribution is based on a continuous function that is not identically zero with a first derivative that exists and is equal to zero almost everywhere [3].

The Cantor Function itself was formulated by Russian-born mathematician Georg Cantor in 1872 and describes the fractional parts of real numbers.

Cantor distribution defined

The Cantor distribution is an infinite series of nested intervals mapping to 0 or 1 (similar to the flip of a fair coin), ultimately converging to 0 everywhere except on a countable set of points where it converges to 1.

It can be defined as the random series [4]

definition of cantor distribution

Where

υ = a parameter
X_i = random variables that take on values 0 and 1 with probability ½.

A Cantor distributed random variable on the unit interval can also be represented by the infinite sum [5]

Where X_n is a series of independent, identically distributed (iid) random variables such that P(X = 0) = P(X = 2) = ½.

The Cantor set, the support of the Cantor distribution, fails to be differentiable at any point. However, on the set’s complement, the derivative is zero. In other words, its’ not possible to to calculate probabilities via integrating the derivative of the CDF.

The Cantor set is created by an infinite process [3]:

Start with a closed interval [0,1].
During the n^th step of the process, remove the 2ⁿ⁻¹ middle-third of the open intervals, each of which has length 1/3ⁿ
What remains is 2ⁿ disjoint, closed intervals. The Cantor set is the infinite intersection of the closed set; The sum of the deleted intervals length is one, so the Cantor set has Lebesgue measure zero.

The first seven intervals are shown in the following video:

Watch this video on YouTube

Why does it matter?

The main reason why understanding the Cantor distribution matters is because it can help us understand how different variables interact with each other in complex ways. By understanding how different variables interact with one another through various distributions—such as Bernoulli, Poisson, and Laplace—we can make better predictions about how those same variables will behave in similar situations in the future. This knowledge can be used for everything from forecasting stock prices to predicting weather patterns and much more!

References

[1] Image: CantorEscalier.svg: Theonderivative work: Amirki, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[2] Arnold, B. The generalized Cantor distribution and its corresponding inverse distribution. August 2011. Statistics & Probability Letters 81(8):1098-1103 DOI:10.1016/j.spl.2011.03.003

[3] Marengo, J. and Farnsworth, D. (2022) Probability Models with Discrete and Continuous Parts. Open Journal of Statistics, 12, 82-97. doi 10.4236/ojs.2022.121006.

[4] Prodinger, H. (1996). Some Properties of the Cantor Distribution.

[5] Gut, A. (2006). Probability: A Graduate Course. Springer Science & Business Media.

Cantor distribution defined

Why does it matter?

References

Leave a Comment