# Determining and Comparing Times to a Discrete Event: the Kaplan-Meier Survival Plot

Sometimes the variable we need to compare is time until the occurrence of a discrete event. If we take a sample of subjects and follow them up until the event occurs, we can just compare mean times using a two sample t-test or z-test. However, what if the study ends at a time when the event has not yet occurred in all individuals? Do we just set their event time as the end of the study? That would skew the data to a shorter mean time because there would be some unknown extra time for those individuals. Do we just consider them missing data? That would similarly skew the data, because the missing data are not random but selected by the study to be subjects with longer times to event.

The Kaplan- Meier survival plot is a way to deal with a study looking at occurrence of an event when not all subjects have experienced the event yet. It looks at the slope of occurrence of the event over time and extrapolates forward in time to the point where all subjects would have had the event.

The event does not have to be death, as implied by the rather morbid choice of name. It could be death, as that is obviously a final end point for the subject, but it could just as easily be time to requiring an intervention, such as respiratory support for patients with motor neurone disease, re-operation for patients with back pain, or end of battery life for patients with pacemakers. In fact I think that death is often a poor choice of end point, because if the life expectancy for the subjects is not appallingly short, many of the deaths could be from causes that do not relate to the factor under study. An example would be looking at the potential benefit of statins by measuring time from their introduction until death. Both test subjects and matched controls started on statins may live for many years afterwards and die of causes unrelated to stroke or heart disease.

The Kaplan –Meier survival plot accounts for subjects who drop out of the study not only because the end time of the trial was reached, but for any other reason. On the plot, subjects reaching the end point are represented as small steps down in the proportion still “surviving”, while drop-outs (also called censored subjects) are represented as vertical ticks without any step down.

Kaplan Meier plot comparing survival of two patient populations, one with gene A and the other with gene B (Photo credit: Wikipedia).

The variance of the Kaplan-Meier statistic that one would use, for example, to express the confidence interval of the estimate for average survival time is:

St2Σ(di/(ni(ni – di)))

where St is the probability that a given subject will survive until a certain time, di is the number that have died at time i, and ni is the number still at risk at time i. The number still at risk is the number surviving minus the number that dropped out for other reasons.

When comparing the plots of two different subject groups, one can use the log-rank test, which is a non parametric test as the time data are not likely to be normally distributed, or the Cox proportional hazards test. The latter test is for a factor that imparts a risk of death or other end point persistently upon the subject population, e.g. cumulative risk of stroke, rather than one where the risk is all at once, e.g. thrombolysis for stroke.