InterPreT Cancer Survival: Further Info

Abbreviation	Description
.	.
$ S(t) $	All-cause survival
$ S^{*}(t) $	Expected survival
$ R(t) $	Relative survival
$ F_{all}(t) $	All-cause crude probability of death
$ F_{cancer}(t) $	Crude probability of death due to cancer
$ F_{other}(t) $	Crude probability of death due to other causes
$ \lambda(t) $	Excess hazard rate
$ h(t) $	All-cause hazard rate
$ h^{*}(t) $	Expected hazard rate
$ \Lambda(t) $	Cumulative excess hazard rate
$ H(t) $	Cumulative all-cause hazard rate
$ H^{*}(t) $	Cumulative expected hazard rate
$ s\left(\ln(t) \mid \gamma, \mathbf{k}\right) $	Spline function on log-time with a vector of $K$ knots, $\mathbf{k}$, and $ K - 1$ parameters, $\gamma$
$ \mathbf{x} $	Vector of covariates

Net survival interpreted directly is the survival probability in the hypothetical world where you can only die from the cancer of interest and not anything else. This can be estimated within a cause-specific or "relative survival" framework.

Net Survival within the Cause-specific Framework

Net survival is the survival probability in the hypothetical world where you can only die from the cancer of interest and not anything else. This can be estimated within a cause-specific or "relative survival" framework. We concentrate on estimating from within the relative survival framework.

Relative Survival

Net survival can be estimated from within the relative survival framework which is more popular in large epidemiological studies. Estimation under this framework does not require cause of death information. However, there are a number of complications in interpreting this quantity and we need to also consider the assumptions involved behind them.

Assumptions and Interpretation

To estimate net survival, one can use a model-based approach to calculate relative survival separately by sex and continuous age. This can be interpreted as either:

A ratio of the overall survival for the $i^{th}$ cancer patient to the expected survival of a comparable individual in the general population matched in most cases for age, sex and calendar year::
\begin{equation} R_{i}(t) = \frac{S_{i}(t)}{S_{i}^{*}(t)} \end{equation}
Or, as net survival (with some assumptions)

When interpreting relative survival as net survival we require two important assumptions:

1. The estimates of expected survival are appropriate i.e. the non-cancer mortality of cancer patients is accurately reflected by the mortality rates in the population life table given that they are stratified by appropriate covariates.
2. There is conditional independence between cancer related and non-cancer related mortality i.e. other than the factors adjusted for in estimation, no other factors will be related to both cancer and non-cancer mortality.

To obtain the cancer survival statistics used in InterPreT, a model-based approach was adopted using a flexible parametric relative survival model to estimate net survival.

We incorporate the class of flexible parametric survival models introduced by Royston and Parmar which incorporates restricted cubic splines (or natural cubic splines) on the log-cumulative hazard scale [Royston and Parmar, 2002]. A further detailed description of these models can be found in "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model" by Royston and Lambert (2011). An interactive learning tool on Restricted Cubic Splines is also available. These models have also been extended for modelling relative survival which uses expected mortality rates [Nelson et. al, 2007].

In the current version of the tool, separate flexible parametric relative survival models have been fitted for each sex with age at diagnosis as the only included covariate such that:

$$ \eta_{i} = \ln\left[\Lambda_{i}(t \mid \mathbf{x}_{i})\right] = \overbrace{s\left(\ln(t_{i}) \mid \gamma_{i}, \mathbf{k}_{0i}\right)}^{\text{baseline spline function}} + \mathbf{x}_{i}\beta + \underbrace{\sum_{j=1}^{D}{s\left(\ln(t_{i}) \mid \delta_{ij}, \mathbf{k}_{ij}\right)\mathbf{x}_{ij}}}_{\text{time-dependent spline function for } \mathbf{x}_{i}} $$

Where $x_{i}$ are restricted cubic spline derived variables for age with 4 degrees of freedom (3 knots). 5 degrees of freedom were used for the baseline RCS and time-dependent RCS with 3 degrees of freedom for the non-linear effect of continuous age was included. These models were fitted for both Swedish and English datasets. Models were fitted under a period analysis to obtain up-to-date estimates. For specific details on the period window used for each country, please refer to the "Data" section below. For a further, more detailed technical description on flexible parametric models and its associated parameters, please refer to "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model" by Royston and Lambert (2011). Relative survival was obtained through the standard transformation of the log-cumulative excess hazard for each sex:

$$ R_{i}(t) = \exp\left[-\exp\left(\eta_{i}\right)\right] $$

We then calculate expected survival directly for each individual matched by gender, age and year from the latest population mortality life tables provided by the Cancer Survival Group at the London School of Hygiene and Tropical Medicine. Finally, using the relationship in Equation (1), we can obtain the all-cause survival:

$$ S_{i}(t) = R_{i}(t)S_{i}^{*}(t) $$

For a model that truly reflects the impact of cancer on survival, other important disease characteristics need to be included (e.g. stage at diagnosis). This will be incorporated into a future prognostic version of the tool.

To calculate the crude probabilities of death (otherwise known as the cumulative incidence function in the cause-specific framework) after fitting a relative survival model, we use the formulation defined by Lambert (2010). Thus, the crude probability of death due to cancer, other causes and all-causes can be obtained easily by numerical integration using the all-cause survival function along with the appropriate hazard function. Each of these are defined below:

$$ F_{cancer, i}(t) = \int_0^t{S_{i}(u)\lambda_{i}(u)} du $$

$$ F_{other, i}(t) = \int_0^t{S_{i}(u)h_{i}^{*}(u)} du $$

$$ F_{all, i}(t) = F_{cancer, i}(t) + F_{other, i}(t) = \int_0^t{S_{i}(u)h_{i}(u)} du $$

Why are these not reported as cause-specific crude survival probabilities? The reason for this is down to the awkward formulation of crude probabilities on the survival scale which leads to an awkward interpretation. For example, the cancer-specific crude survival probability, $1 - F_{cancer}(t)$, would be interpreted as the probability of not dying from cancer which includes the probability of dying from other causes before time t and the probability of being alive at t.

English Data

The English data were obtained from the PHE National Cancer Registration and Analysis Service (NCRAS). This is a large cancer registry dataset that contains information on patients diagnosed between 1995 to 2013 for 6 different cancers. Up-to-date estimates were obtained using a 3 year period window between 01 Jan 2013 to 31 Dec 2015. The size of the dataset by each cancer site is summarised below:

Cancer Site	Male	Female
.	.
Melanoma	64,551	76,239
Lung	316,936	223,523
Colon	166,323	160,522
Rectum	116,312	74,389
Breast	-	660,538
Prostate	521,518	-

Swedish Data

The Swedish data were obtained from the Swedish Cancer Registry which covers the entire Swedish population and was established in 1958. In Sweden, it is mandatory by law for health care providers to report newly detected cancer to one of six regional cancer centers which then pass the information to the National Board of Health and Welfare (Socialstyrelsen) who compile the Swedish Cancer Registry. The Swedish Cancer Registry contains information on patients diagnosed with cancer including death information which is obtained from the Swedish Cause of Death Register. Up-to-date estimates were obtained using a 5 year period window between 01 Jan 2013 to 31 Dec 2017. The size of the dataset by each cancer site is summarised below:

Cancer Site	Male	Female
.	.
Melanoma	13,319	13,169
Lung	9,162	9,967
Colon	15,383	16,349
Rectum	10,148	7,815
Breast	-	72,601
Prostate	103,508	-

No individual patient data have been stored on the servers to calculate the predictions in InterPreT. The model was fitted to the data remotely in Stata 15 and only the corresponding model parameters have been exported which have been used to obtain the cancer statistics within the online tool.

All-Cause Survival

Expected Survival

Net Survival

Crude Probability of Death

Notation

Net/Relative Survival

Net Survival within the Cause-specific Framework

Relative Survival

Flexible Parametric Relative Survival Models

Crude Probability of Death

Data

English Data

Swedish Data

How were these statistics obtained?

What do these numbers mean for me as a cancer patient?

Abbreviation	Description
.	.
\( S(t) \)	All-cause survival
\( S^{*}(t) \)	Expected survival
\( R(t) \)	Relative survival
\( F_{all}(t) \)	All-cause crude probability of death
\( F_{cancer}(t) \)	Crude probability of death due to cancer
\( F_{other}(t) \)	Crude probability of death due to other causes
\( \lambda(t) \)	Excess hazard rate
\( h(t) \)	All-cause hazard rate
\( h^{*}(t) \)	Expected hazard rate
\( \Lambda(t) \)	Cumulative excess hazard rate
\( H(t) \)	Cumulative all-cause hazard rate
\( H^{*}(t) \)	Cumulative expected hazard rate
\( s\left(\ln(t) \mid \gamma, \mathbf{k}\right) \)	Spline function on log-time with a vector of \(K\) knots, \(\mathbf{k}\), and \( K - 1\) parameters, \(\gamma\)
\( \mathbf{x} \)	Vector of covariates