Here, a brief description of the measures used in InterPreT, along with the data and model, is provided. When a patient is diagnosed with cancer, there are a number of measures that can be reported to understand prognosis. These are based on an aggregated average of people in England. Typically, people are categorised by age, gender, year and sometimes, deprivation groups. These are described below.

#### All-Cause Survival

For a person diagnosed with cancer the all-cause survival measure gives the chance of being alive at different points in time after a cancer diagnosis. Those who die in this period may die from cancer or other causes. In other words, it would represent the proportion of people who are still alive, at say, 5 years after diagnosis. This is not often reported.

#### Expected Survival

For a person diagnosed with cancer the expected survival gives the chance of being alive for a person of the same calendar year, age and sex in the general population (who are assumed to be cancer free). This measure can be used as a benchmark to see how much of an “extra” impact a diagnosis of cancer has on the prospects of survival over the coming years. So, the difference between the all-cause survival and expected survival shows the extra impact of cancer.

#### Net Survival

Net Survival is not useful to express the survival experience of an individual cancer patient. It is used at a population level to enable fair comparisons between different groups.

Net survival is a measure that tries make fair comparisons across groups where the chance of dying of something other than cancer are different; this could be people of different ages, different geographic areas or countries, or being a man compared to being a woman. Net survival tries to portray the survival experience of people if the cancer of interest was the only possible cause of death – we can then make a fair comparison on whether cancer has a greater impact on survival chances without the confusion of differences in chances of dying of something else.

Example: Take an 85 year old and a 30 year old. They would have a different chance of reaching the milestone of 10 year all-cause survival because of the fact that the chance of dying from most other causes would generally be higher for older people. We therefore couldn’t simply count the number of 85 and 30 year olds with cancer who do not survive for 10 years and fairly conclude that the impact of cancer is higher for 85 year olds than 30 year olds. First we must discount the differences in other cause survival and then compare solely by the impact of cancer. This is what the net measures attempt to do.

For a more technically detailed description, including assumptions associated with Net Survival, please refer to the “Net Survival” section under the “Technical Description” tab.

#### Crude Probability of Death

We can break the chances of no longer being alive at a certain point in time (e.g. 5 years) down into how many would likely die of cancer and how many would die of other causes. These are known as the crude probabilities of death.

This measure is useful for making treatment-related decisions or for the planning and provision of future health-care services.

This page is still under development.

#### Notation

Abbreviation | Description |
---|---|

. | . |

\( S(t) \) | All-cause survival |

\( S^{*}(t) \) | Expected survival |

\( R(t) \) | Relative survival |

\( F_{all}(t) \) | All-cause crude probability of death |

\( F_{cancer}(t) \) | Crude probability of death due to cancer |

\( F_{other}(t) \) | Crude probability of death due to other causes |

\( \lambda(t) \) | Excess hazard rate |

\( h(t) \) | All-cause hazard rate |

\( h^{*}(t) \) | Expected hazard rate |

\( \Lambda(t) \) | Cumulative excess hazard rate |

\( H(t) \) | Cumulative all-cause hazard rate |

\( H^{*}(t) \) | Cumulative expected hazard rate |

\( s\left(\ln(t) \mid \gamma, \mathbf{k}\right) \) | Spline function on log-time with a vector of \(K\) knots, \(\mathbf{k}\), and \( K - 1\) parameters, \(\gamma\) |

\( \mathbf{x} \) | Vector of covariates |

#### Net/Relative Survival

Net survival interpreted directly is the survival probability in the hypothetical world where you can only die from the cancer of interest and not anything else. This can be estimated within a cause-specific or "relative survival" framework.

##### Net Survival within the Cause-specific Framework

Net survival is the survival probability in the hypothetical world where you can only die from the cancer of interest and not anything else. This can be estimated within a cause-specific or "relative survival" framework. We concentrate on estimating from within the relative survival framework.

##### Relative Survival

Net survival can be estimated from within the relative survival framework which is more popular in large epidemiological studies. Estimation under this framework does not require cause of death information. However, there are a number of complications in interpreting this quantity and we need to also consider the assumptions involved behind them.

Assumptions and Interpretation

To estimate net survival, one can use a model-based approach to calculate relative survival separately by sex and continuous age. This can be interpreted as either:

- A ratio of the overall survival for the \(i^{th}\) cancer patient to the expected survival of a comparable individual in the general population matched in most cases for age, sex and calendar year::

\begin{equation} R_{i}(t) = \frac{S_{i}(t)}{S_{i}^{*}(t)} \end{equation} - Or, as net survival (with some assumptions)

When interpreting relative survival as net survival we require two important assumptions:

- 1. The estimates of expected survival are appropriate i.e. the non-cancer mortality of cancer patients is accurately reflected by the mortality rates in the population life table given that they are stratified by appropriate covariates.
- 2. There is conditional independence between cancer related and non-cancer related mortality i.e. other than the factors adjusted for in estimation, no other factors will be related to both cancer and non-cancer mortality.

#### Flexible Parametric Relative Survival Models

To obtain the cancer survival statistics used in InterPreT, a model-based approach was adopted using a flexible parametric relative survival model to estimate net survival.

We incorporate the class of flexible parametric survival models introduced by Royston and Parmar which incorporates restricted cubic splines (or natural cubic splines) on the log-cumulative hazard scale [Royston and Parmar, 2002]. A further detailed description of these models can be found in "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model" by Royston and Lambert (2011). An interactive learning tool on Restricted Cubic Splines is also available. These models have also been extended for modelling relative survival which uses expected mortality rates [Nelson et. al, 2007].

In the current version of the tool, separate flexible parametric relative survival models have been fitted for each sex with age at diagnosis as the only included covariate such that:

$$ \eta_{i} = \ln\left[\Lambda_{i}(t \mid \mathbf{x}_{i})\right] = \overbrace{s\left(\ln(t_{i}) \mid \gamma_{i}, \mathbf{k}_{0i}\right)}^{\text{baseline spline function}} + \mathbf{x}_{i}\beta + \underbrace{\sum_{j=1}^{D}{s\left(\ln(t_{i}) \mid \delta_{ij}, \mathbf{k}_{ij}\right)\mathbf{x}_{ij}}}_{\text{time-dependent spline function for } \mathbf{x}_{i}} $$

Where \(x_{i}\) are restricted cubic spline derived variables for age with 4 degrees of freedom (3 knots). 5 degrees of freedom were used for the baseline RCS and time-dependent RCS with 3 degrees of freedom for the non-linear effect of continuous age was included. These models were fitted for both Swedish and English datasets. Models were fitted under a period analysis to obtain up-to-date estimates. For specific details on the period window used for each country, please refer to the "Data" section below. For a further, more detailed technical description on flexible parametric models and its associated parameters, please refer to "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model" by Royston and Lambert (2011). Relative survival was obtained through the standard transformation of the log-cumulative excess hazard for each sex:

$$ R_{i}(t) = \exp\left[-\exp\left(\eta_{i}\right)\right] $$

We then calculate expected survival directly for each individual matched by gender, age and year from the latest population mortality life tables provided by the Cancer Survival Group at the London School of Hygiene and Tropical Medicine. Finally, using the relationship in Equation (1), we can obtain the all-cause survival:

$$ S_{i}(t) = R_{i}(t)S_{i}^{*}(t) $$

*For a model that truly reflects the impact of cancer on survival, other important disease characteristics need to be included (e.g. stage at diagnosis). This will be incorporated into a future prognostic version of the tool.*

#### Crude Probability of Death

To calculate the crude probabilities of death (otherwise known as the cumulative incidence function in the cause-specific framework) after fitting a relative survival model, we use the formulation defined by Lambert (2010). Thus, the crude probability of death due to cancer, other causes and all-causes can be obtained easily by numerical integration using the all-cause survival function along with the appropriate hazard function. Each of these are defined below:

$$ F_{cancer, i}(t) = \int_0^t{S_{i}(u)\lambda_{i}(u)} du $$

$$ F_{other, i}(t) = \int_0^t{S_{i}(u)h_{i}^{*}(u)} du $$

$$ F_{all, i}(t) = F_{cancer, i}(t) + F_{other, i}(t) = \int_0^t{S_{i}(u)h_{i}(u)} du $$

*Why are these not reported as cause-specific crude survival probabilities?* The reason for this is down to the awkward formulation of crude probabilities on the survival scale which leads to an awkward interpretation. For example, the cancer-specific crude survival probability, \(1 - F_{cancer}(t)\), would be interpreted as the probability of not dying from cancer which includes the probability of dying from other causes before time t and the probability of being alive at t.

#### Data

##### English Data

The English data were obtained from the PHE National Cancer Registration and Analysis Service (NCRAS). This is a large cancer registry dataset that contains information on patients diagnosed between 1995 to 2013 for 6 different cancers. Up-to-date estimates were obtained using a 3 year period window between 01 Jan 2013 to 31 Dec 2015. The size of the dataset by each cancer site is summarised below:

Cancer Site | Male | Female |
---|---|---|

. | . | |

Melanoma | 64,551 | 76,239 |

Lung | 316,936 | 223,523 |

Colon | 166,323 | 160,522 |

Rectum | 116,312 | 74,389 |

Breast | - | 660,538 |

Prostate | 521,518 | - |

##### Swedish Data

The Swedish data were obtained from the Swedish Cancer Registry which covers the entire Swedish population and was established in 1958. In Sweden, it is mandatory by law for health care providers to report newly detected cancer to one of six regional cancer centers which then pass the information to the National Board of Health and Welfare (Socialstyrelsen) who compile the Swedish Cancer Registry. The Swedish Cancer Registry contains information on patients diagnosed with cancer including death information which is obtained from the Swedish Cause of Death Register. Up-to-date estimates were obtained using a 5 year period window between 01 Jan 2013 to 31 Dec 2017. The size of the dataset by each cancer site is summarised below:

Cancer Site | Male | Female |
---|---|---|

. | . | |

Melanoma | 13,319 | 13,169 |

Lung | 9,162 | 9,967 |

Colon | 15,383 | 16,349 |

Rectum | 10,148 | 7,815 |

Breast | - | 72,601 |

Prostate | 103,508 | - |

No individual patient data have been stored on the servers to calculate the predictions in InterPreT. The model was fitted to the data remotely in Stata 15 and only the corresponding model parameters have been exported which have been used to obtain the cancer statistics within the online tool.

If you have any questions regarding InterPreT, please contact Sarwar Islam [si113@le.ac.uk]. The most frequently asked questions, with answers, are detailed below.

#### How were these statistics obtained?

Net Survival, All-cause Survival and Crude Probability of Death were all predicted after fitting a relatively complex model to an English cancer registry dataset. Information of the cancers included in the dataset and the number of observations for each cancer is summarised in the table below:

This particular model is known as a “Flexible Parametric Relative Survival Model” which allows more flexibility to accurately capture the information contained in the data and makes it easy to obtain useful measures that facilitate the communication of risk. Further technical details are provided under the “Technical Description” tab.

#### What do these numbers mean for me as a cancer patient?

At present, the statistics are only adjusted for the gender and age of the cancer patient and reflect aggregated national level statistics. Therefore, these predictions do not represent the survival or mortality of a certain patient. Each case will differ on other several important factors such as stage at diagnosis, grade and tumor size. For a clearer picture on prognosis on an individual basis, please contact your physician or for guidance, please refer to the links provided in the Support section of the website.