Statistics Definitions > Survival Analysis
- Time the average person lives, from birth.
- Time after cancer treatment until death.
- Time from first heart attack to the second.
- Time from HIV diagnosis to AIDS development.
- Time from manufacture of a component to component failure.
Survival time can be measured in years, months, days, or even fractions of a second.
As well as estimating the time it takes to reach a certain event, survival analysis can also be used to compare time-to-event for multiple groups. For example, two production lines for light bulbs could be compared to see if there is a different in lifetimes. In medicine, two groups with different attributes (like normal/overweight, diabetic/non-diabetic, high/low cholesterol) could be compared to see how those factors contribute to survival time in patients with heart disease, cancer or other diagnosis.
Survival analysis is used to compare groups when time is an important factor. Other tests, like the independent samples t-test or simple linear regression, can compare groups but those methods do not factor in time. In addition, survival times are usually positive numbers; many other methods would have to transform data in some way in order to maintain positive numbers.
Typically, survival data isn’t completely observed. Instead, some of the data is censored. Censoring refers to missing data in a study such as subjects dropping out of trials or data that is otherwise lost. These are called “right censored” subjects. They are usually counted as alive (or disease free) for the duration of the study for purposes of data analysis. Another type of censoring happens when a subject simply doesn’t experience the event in question during the study. It doesn’t necessarily mean that the patient will never experience the event, just that the event didn’t happen under observation. In other words, the time to event is incomplete.
Kaplan–Meier analysis measures the survival time from a certain date to time of death, failure or other significant event. For example, it can be used to calculate:
- How long people remain unemployed after a job loss.
- How long it takes for couples undergoing fertility treatment to get pregnant.
- Time-to-failure of machine parts.
In medicine, Kaplan Meier Analysis is the simplest way to calculate survival time after treatment.
A graph of the Kaplan Meier estimator is a series of decreasing horizontal steps, which approaches the true survival function for that population given a large enough sample size. This graph shows two groups of patients: one with gene profile A and one with gene profile B. People with gene B die at a faster rate than those with gene A.
If every study participant is followed for the same length of time until their death, calculating the survival time is as easy as figuring out the fraction of surviving participants at any point in time. However, in the real world, complicating factors often make this task impossible. For example, calculating survival time can become complicated in clinical trials with factors like:
- Patients that drop out of the study either on purpose or because they lose touch with the researcher.
- Patients who are still alive at the end of the study, but who are expected to die (or do die) at a later date.
- Patients who enter the study at a later date than other patients.
Kaplan Meier Analysis is an effective tool for calculating survival time despite these factors, which collectively are called “censored” participants. Two outcomes are possible: either the study participant has the event outcome (i.e. they die), or they do not (i.e. they are censored).
Performing Kaplan Meier Analysis
For each time interval, the survival probability is calculated by:
Survival probability = number of participants surviving / number of participants at risk
Participants are not counted in the denominator (participants at risk) if they have dropped out, died, or not reached that time yet. The probability of survival to any point is the cumulative probability of surviving the preceding time intervals.
Actually calculating the Kaplan Meier analysis is usually performed with statistical software. Click the link for instructions in:
Censoring Newsletter. Cornell.Edu.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!