Survival Models

Column

Survival Analysis

The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time.

It captures the probability that the system will survive beyond a specified time.

Reliability function

  • The term reliability function is common in engineering while the term survival function is used in a broader range of applications, including human mortality.
  • Another name for the survival function is the complementary cumulative distribution function.

Definition

Let \(T\) be a continuous random variable with cumulative distribution function \(F(t)\) on the interval \([0,\infty)\).

Its survival function or reliability function is:

\[R(t) = P(T > t)\]

\[R(t) = \int_t^{\infty} f(u)\,du,.\]

\[R(t) = 1-F(t).\]

Properties of the Survival Function
  • Every survival function \(R(t)\) is monotonically decreasing, i.e. \(R(u) \le R(t)\) for all \(u > t\).
  • The time, t = 0, represents some origin, typically the beginning of a study or the start of operation of some system.
  • \(R(0)\) is commonly unity but can be less to represent the probability that the system fails immediately upon operation.
  • The survivor function is right-continuous.

The Cox Proportional-Hazards Model

The most common model used to determine the effects of covariates on survival

\[ h_i(t)=h_0(t)exp(\beta_{1}x_{i1} + \beta_{2}x_{ik} + ... + \beta_{2}x_{ik} ) \]

It is a semi-parametric model:

The baseline hazard function is unspecified The effects of the covariates are multiplicative Doesn’t make arbitrary assumptions about the shape/form of the baseline hazard function The Proportional Hazards Assumption

Covariates multiply the hazard by some constant e.g. a drug may halve a subjects risk of death at any time ( t ) The effect is the same at any time point Violating the PH assumption can seriously invalidate your model!

The Survival Function (Survival curve)

\[ S(t)=Pr(T>t) \]

The Survival function (( S )) is the probability that the time of death (( T )) is greater than some specified time (( t ))

It is composed of:

The underlying Hazard function (How the risk of death per unit time changes over time at baseline covariates) The effect parameters (How the hazard varies in response to the covariates)

Hazard Probability

  • In the medical world, doctors often want to understand which treatments help patients survive longer and which have no effect at all (or worse). In the business world, the equivalent concern is when customers stop.
  • This is particularly true of businesses that have a well-defined beginning and end to the customer relationship subscription-based relationships. These relationships are found in a wide range of industries, such as insurance, communication, cable televisions, newspaper/magazine subscription, banking, and electricity providers in competitive markets.
  • The basis of survival data mining is the hazard probability, the chance that someone who has survived for a certain length of time (called the customer tenure) is going to stop, cancel, or expire before the next unit of time.
  • This definition assumes that time is discrete, and such discrete time intervals whether days, weeks, or months often fits business needs. By contrast, traditional survival analysis in statistics usually assumes that time is continuous.

Censoring

Column

Censoring in Survival Analysis

Types of Censoring
  • [Left censoring] a data point is below a certain value but it is unknown by how much.
  • [Interval censoring] a data point is somewhere on an interval between two values.
  • [Right censoring] a data point is above a certain value but it is unknown by how much.
Type I and II Censoring
  • [Type I] censoring occurs if an experiment has a set number of subjects or items and stops the experiment at a predetermined time, at which point any subjects remaining are right-censored.
  • [Type II] censoring occurs if an experiment has a set number of subjects or items and stops the experiment when a predetermined number are observed to have failed; the remaining subjects are then right-censored.
Survival Analysis : Censoring
  • Random (or non-informative) censoring is when each subject has a censoring time that is statistically independent of their failure time. The observed value is the minimum of the censoring and failure times; subjects whose failure time is greater than their censoring time are right-censored.
  • Interval censoring can occur when observing a value requires follow-ups or inspections. Left and right censoring are special cases of interval censoring, with the beginning of the interval at zero or the end at infinity, respectively.
  • Estimation methods for using left-censored data vary, and not all methods of estimation may be applicable to, or the most reliable, for all data sets.

Kaplan-Meier Estimators

Column

Kaplan-Meier survival estimate

The Kaplan-Meier (KM) method is a non-parametric method used to estimate the survival probability from observed survival times (Kaplan and Meier, 1958).

The survival probability at time \(t_i\), \(S(t_i)\), is calculated as follow:

\[ S(ti)=S(ti−1)(1−dini)\]

Where,

  • \(S(t=1)\) = the probability of being alive at \(t=1\)
  • \(n_i\) = the number of patients alive just before \(t_i\)
  • \(d_i\) = the number of events at titi
  • t0 = 0, \(S(0) = 1\)

The estimated probability (S(t)) is a step function that changes value only at the time of each event. It’s also possible to compute confidence intervals for the survival probability.

The KM survival curve, a plot of the KM survival probability against time, provides a useful summary of the data that can be used to estimate measures such as median survival time.

Estimator

The estimator of the survival function \(\normalsize S(t)\) (the probability that life is longer than \(\normalsize t\)) is given by:

\[{\normalsize {\widehat {S}}(t)=\prod \limits _{i:\ t_{i}\leq t}\left(1-{\frac {d_{i}}{n_{i}}}\right) =\prod \limits _{i:\ t_{i}\leq t}\left(1-\lambda_i \right),} \]

with \(\normalsize t_{i}\) a time when at least one event happened, \(\normalsize d_i\) the number of events (e.g., deaths) that happened at time \(\normalsize t_{i}\), and \(\normalsize n_{i}\) the individuals known to have survived (have not yet had an event or been censored) up to time \(\normalsize t_{i}\).

Kaplan-Meier Product-Limit Estimator

Rather than classifying the observed survival times into a life table, we can estimate the survival function directly from the continuous survival or failure times. Intuitively, imagine that we create a life table so that each time interval contains exactly one case. Multiplying out the survival probabilities across the “intervals” (i.e., for each single observation) we would get for the survival function:

\[S(t) = \sum_{t= 1} [(n-j)/(n-j+1)]( j )\]

In this equation, S(t) is the estimated survival function, n is the total number of cases, and denotes the multiplication (geometric sum) across all cases less than or equal to t; (j) is a constant that is either 1 if the j’th case is uncensored (complete), and 0 if it is censored. This estimate of the survival function is also called the product-limit estimator, and was first proposed by Kaplan and Meier (1958). An example plot of this function is shown below.

The advantage of the Kaplan-Meier Product-Limit method over the life table method for analyzing survival and failure time data is that the resulting estimates do not depend on the grouping of the data (into a certain number of time intervals). Actually, the Product-Limit method and the life table method are identical if the intervals of the life table contain at most one observation.

Kaplan Meier Estimates of the Survival Function

Worked Examples

Worke Example

A study of the mortality of 12 laboratory-bred insects was undertaken. The insects were observed from birth until either they died or the period of study ended, at which point those insects still alive were treated as censored.

The following table shows the Kaplan-Meier estimate of the survival function, based on data from the 12 insects.

t (weeks) \(\normalsize S(t)\)
\(\normalsize 0 \leq t < 1\) 1.0000
\(\normalsize 1 \leq t < 3\) 0.9167
\(\normalsize 3 \leq t < 6\) 0.7130
\(\normalsize t \geq 6\) 0.4278

Exercises

  1. Calculate the number of insects dying at durations 3 and 6 weeks.
  2. Calculate the number of insects whose history was censored.

Solution

Solution

Nelson-Aalen Estimators

Column

Nelson-Aalen estimator of the cumulative hazard function

The Nelson-Aalen estimator is a non-parametric estimator of the cumulative hazard rate function in case of censored data or incomplete data.

It is used in survival theory, reliability engineering and life insurance to estimate the cumulative number of expected events.

An “event” can be the failure of a non-repairable component, the death of a human being, or any occurrence for which the experimental unit remains in the “failed” state (e.g., death) from the point at which it changed on.

Formula

The Nelson-Aalen estimator is a non-parametric estimator of the cumulative hazard function and is given by: \[\LARGE \tilde{H}(t)=\sum_{t_i\leq t}\frac{d_i}{n_i},\] with \(\LARGE d_i\) the number of events at \(\LARGE t_i\) and \(\LARGE n_i\) the total individuals at risk at \(t_i\). where \(\LARGE d_i\) is the number who failed out of \(\LARGE n_i\) at risk in interval ti.)


The curvature of the Nelson-Aalen estimator gives an idea of the hazard rate shape. A concave shape is an indicator for infant mortality while a convex shape indicates wear out mortality. It can be used for example when testing the homogeneity of Poisson processes.


Because of its simple relationship with the survival function, \(\LARGE S(t)=e^{−H(t)}\) the cumulative hazard function can be used to estimate the survival function.

The estimator is calculated, then, by summing the proportion of those at risk who failed in each interval up to time t.

Worked Examples

Worked Example

The Shining Light company has developed a new type of light bulb which it recently tested. 1,000 bulbs were switched on and observed until they failed, or until 500 hours had elapsed. For each bulb that failed, the duration in hours until failure was noted. Due to an earth tremor after 200 hours, 200 bulbs shattered and had to be removed from the test before failure. The results showed that * 10 bulbs failed after 50 hours, * 20 bulbs failed after 100 hours, * 50 bulbs failed after 250 hours, * 300 bulbs failed after 400 hours, * 50 bulbs failed after 450 hours.

Exercises

  1. Calculate the Kaplan-Meier estimate of the survival function, \(\LARGE S(t)\), for the light bulbs in the test.
  2. Sketch the Kaplan-Meier estimate calculated in part (a).
  3. Estimate the probability that a bulb will not have failed after each of the following durations: 300 hours, 400 hours and 600 hours. If it is not possible to obtain an estimate for any of the durations without additional assumptions, explain why.

    Solution

    Solution

Cox Regression

Column

Cox’s Proportional Hazard Model

The proportional hazard model is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying survival distribution. The model assumes that the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates); no assumptions are made about the nature or shape of the hazard function. Thus, in a sense, Cox’s regression model may be considered to be a nonparametric method. The model may be written as:

\[\LARGE h{(t), (z1, z2, \ldots, zm)} = h0(t)exp(b_1z_1 + \ldots + b_mz_m)\]

where h(t,…) denotes the resultant hazard, given the values of the m covariates for the respective case (z1, z2, …, zm) and the respective survival time (t). The term \(h_0(t)\) is called the baseline hazard; it is the hazard for the respective individual when all independent variable values are equal to zero. We can linearize this model by dividing both sides of the equation by \(h_0(t)\) and then taking the natural logarithm of both sides:

\[\LARGE \log[h{(t), (z...)}/h_0(t)] = b_1z_1 + ... + b_mz_m\]

We now have a fairly “simple” linear model that can be readily estimated.

Assumptions.

While no assumptions are made about the shape of the underlying hazard function, the model equations shown above do imply two assumptions. First, they specify a multiplicative relationship between the underlying hazard function and the log-linear function of the covariates. This assumption is also called the proportionality assumption.

In practical terms, it is assumed that, given two observations with different values for the independent variables, the ratio of the hazard functions for those two observations does not depend on time. The second assumption of course, is that there is a log-linear relationship between the independent variables and the underlying hazard function.

Time-Dependent Covariates

An assumption of the proportional hazard model is that the hazard function for an individual (i.e., observation in the analysis) depends on the values of the covariates and the value of the baseline hazard. Given two individuals with particular values for the covariates, the ratio of the estimated hazards over time will be constant – hence the name of the method: the proportional hazard model. The validity of this assumption may often be questionable.

For example, age is often included in studies of physical health. Suppose you studied survival after surgery. It is likely, that age is a more important predictor of risk immediately after surgery, than some time after the surgery (after initial recovery). In accelerated life testing one sometimes uses a stress covariate (e.g., amount of voltage) that is slowly increased over time until failure occurs (e.g., until the electrical insulation fails; ). In this case, the impact of the covariate is clearly dependent on time. The user can specify arithmetic expressions to define covariates as functions of several variables and survival time.

Testing the Proportionality Assumption

As indicated by the previous examples, there are many applications where it is likely that the proportionality assumption does not hold. In that case, one can explicitly define covariates as functions of time.

For example, the analysis of a data set presented by Pike (1966) consists of survival times for two groups of rats that had been exposed to a carcinogen (see also Lawless, 1982, page 393, for a similar example). Suppose that z is a grouping variable with codes 1 and 0 to denote whether or not the respective rat was exposed. One could then fit the proportional hazard model:

\[h(t,z) = h0(t)exp{b1z + b2[zlog(t)-5.4]}\]

Thus, in this model the conditional hazard at time t is a function of (1) the baseline hazard h0, (2) the covariate z, and (3) of z times the logarithm of time. Note that the constant 5.4 is used here for scaling purposes only: the mean of the logarithm of the survival times in this data set is equal to 5.4. In other words, the conditional hazard at each point in time is a function of the covariate and time; thus, the effect of the covariate on survival is dependent on time; hence the name time-dependent covariate. This model allows one to specifically test the proportionality assumption. If parameter b2 is statistically significant (e.g., if it is at least twice as large as its standard error), then one can conclude that, indeed, the effect of the covariate z on survival is dependent on time, and, therefore, that the proportionality assumption does not hold.