Canonical Statistic (Natural Statistic): Definition

Statistics Definitions >

A canonical statistic (sometimes called a natural statistic) is a way to specify a particular exponential distribution. All exponential families of distributions over x have the general form (Creager, 2018)

p(x| η) = h(x) g(η) exp{η^T u(x)}

Where:

u(x) is the canonical (natural) statistic, which is a function of x,
η is the natural parameter,
h(x) is the base measurement, which is often constant,
g(η) is the normalizer.

The canonical statistic is usually a minimal sufficient statistic.

Example:
A sequence of Bernoulli trials might result in the following probability function for the outcome sequence y = (y₁, … y_n (Sundberg, 2019):

The canonical statistic here is y(y) = Σ y_i (Σ is summation notation, which means to “Add them up”).

Non Uniqueness of a Canonical Statistic

The name “canonical” in math means to indicate a choice from a particular number of different conventions, leading to a unique choice. However, a canonical parameter and statistic are not unique (Geyer, 2020):

Any one-to-one affine function of a canonical parameter (or statistic) is canonical. However, these change the canonical statistic (or parameter) in addition to the cumulant function.
A scalar-valued affine function of the canonical parameter can be added to the cumulant function. This will change the canonical statistic.

Although there are many possibilities, the workaround is to make a choice: “The” canonical statistic is a result of fixing one choice of statistic, from all of the different possibilities.

Related article: Canonical Correlation Analysis / Variates.

References

Creager, E. (2018). Introduction to Advanced Probability for Graphical Models. Retrieved January 15, 2021 from: http://www.cs.toronto.edu/~jessebett/CSC412/content/week1/tutorial1-probability-412–ec-edit.pdf
Geyer, C. (2020). Stat 5421 Lecture Notes. Exponential Families, Part I.
Sundberg, R. (2019). Statistical Modelling by Exponential Families. Cambridge University Press.