Survival Analysis > Hazard Function
What is a Hazard Function?
The hazard function (also called the force of mortality, instantaneous failure rate, instantaneous death rate, or age-specific failure rate) is a way to model data distribution in survival analysis. The most common use of the function is to model a participant’s chance of death as a function of their age. However, it can be used to model any other time-dependent event of interest. More specifically, the hazard function models which periods have the highest or lowest chances of an event.
The function is defined as the instantaneous risk that the event of interest happens, within a very narrow time frame. More specifically, the hazard function is a way to describe the ‘intensity of death’ at the time t, given that the individual has already survived past time t. The function enables us to identify periods with the highest or lowest chances of experiencing an event and represents the instantaneous risk of the event occurring within a very narrow time frame. This measurement can be understood as the derivative at a specific point.
Hazard functions and survival functions are alternatives to traditional probability density functions (pdfs). They are better suited than PDFs for modeling the types of data found in survival analysis.
Conditional and variations
The hazard function is a conditional failure rate, in that it is conditional a person has actually survived until time t. In other words, the function at year 10 only applies to those who were actually alive in year 10; it doesn’t count those who died in previous periods. There are other variations on the function, other than as a conditional rate. The Kaplan Meier (KM) method uses rates, has no upper limit, and is preferred for clinical trials [1]. Conversely, with the actuarial method, the hazard function is a proportion, with values between 0 and 1.
Formula for the hazard function
The hazard function formula is:
Where:
- fY(y) = the probability density function of survival time Y,
- SY = the Survivor function (the probability of surviving beyond a certain point in time)
The formula and shape of the function depend on the value of the pdf.
Cumulative hazard function
The cumulative hazard function, or cumulative hazard rate, represents the total accumulated risk of experiencing an event to time t. In other words, it’s a sum of (small) probabilities. Unlike the instantaneous hazard rate, which can change over time, the cumulative hazard rate can only increase or stay the same. This is because the instantaneous hazard rate must be greater than or equal to zero.
It is defined as the integral of the hazard function, h(u), from 0 to t [2] :
It can also be defined in terms of the survival function S(t):
S(t) = e–H(t) or H(t)=-ln(S(t)).
The exact equation depends on the data. For example, the cumulative hazard for the exponential distribution is H(t) = αt.
Variations
The hazard function represents a conditional failure rate, which means it gives the probability of an event, given that event has not already happened. Thus, it applies only to individuals who have survived an “event” until a specific time (e.g., year 10) and does not include those who experienced the event in previous periods. For example, a cancer patient in remission might be more interested in knowing the answer to the question “What is the probability I will have a recurrence in the next year, given that I haven’t had one yet?” [2].
Variations include the Kaplan Meier (KM) method, which uses rates and has no upper limit, which means that it can be used to estimate survival functions for any length of time. It is often preferred for clinical trials [1], perhaps because it is easy to understand and interpret. The basic steps are:
- Sort data by survival time.
- Calculate the number of individuals still alive at each point in time.
- Calculate the probability of survival at each point in time.
- Repeat steps 2 and 3 for all time points.
The KM method treats the hazard function as a rate, while the actuarial method treats the hazard function as a proportion. The actuarial method is more challenging to use, but it is often preferred with the probability distribution of the data is known.
Is the hazard function the failure rate?
The hazard function and the failure rate are closely related, but they are not quite the same thing:
- The hazard function gives us the probability of an event happening within a short period time, given the condition that the event hasn’t happened yet.
- The failure rate gives us the probability of an event happening at a specific time. It is not a conditional probability.
As an example, the hazard function for death may be lowest during early years of a disease and increase over time. The failure rate would be the probability of death at a certain age.
Another important difference is while the hazard function can give us the risk of failure for any type of event – not just death — the failure rate is usually used exclusively for calculating the risk of death.
Survival function vs. hazard function
The hazard and survival functions are both used to explain the risk of an event happening. However, they differ in how they go about it:
- Survival function: the probability of surviving until a specific time.
- Hazard function: the probability of an event occurring in a small time interval, given that the event has not already happened.
The hazard function can be used to calculate the survival function, and vice versa:
- The survival function is the cumulative distribution function (CDF) of the hazard function.
- The hazard function is the derivative of the survival function.
References
- Fink, S., Brown, R. (2006). Survival Analysis. Gastroenterol Hepatol (N Y). May; 2(5): 380–383. Retrieved May 28, 2018 from here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5338193/).
- Survival Distributions, Hazard Functions, Cumulative Hazards. Retrieved September 4, 2024 from: https://web.stanford.edu/~lutian/coursepdf/unit1.pdf