Mathematical methods for `baselinenowcast`
Source:vignettes/model_definition.Rmd
model_definition.Rmd
1 Overview
The baselinenowcast
model, initially developed as a reference for the COVID-19 hospital admissions nowcasting challenge in Germany (2021-2022, Wolffram et al, utilises reporting triangles of preliminary case counts and their delays. It applies a multiplicative approach, using empirically observed historical delay distributions to estimate yet-to-be-observed cases. Users can specify whether delay distributions are estimated from the latest data, fully observed past data, or data from different strata. The model produces point estimates of nowcasts by “filling in” the reporting triangle. Probabilistic nowcasts are generated using a negative binomial model with means from the point nowcast and dispersion parameters estimated from past nowcast errors. The dispersion parameters can also be flexibly sourced from current data, historical data, or alternative settings.
1.1 Notation
We denote \(X_{t,d}, d = 0, .., D\) as the number of cases occurring on time \(t\) which appear in the dataset with a delay of \(d\). For example, a delay \(d = 0\) means that a case occurring on day \(t\) arrived in the dataset on day \(t\), or on what is considered to be the first possible report date in practice. We only consider cases reporting within a maximum delay \(D\). The number of cases reporting for time \(t\) with a delay of at most \(d\) can be written as:
\[X_{t, \le d} = \sum_{i=0}^d X_{t,i} \]
Such that \(X_t = X_{t, \le D}\) is the “final” number of reported cases on time \(t\). Conversely, for \(d < D\)
\[X_{t,>d} = \sum_{i = d+1} ^{D} X_{t,i}\]
is the number of cases still missing after \(d\) delay. We refer to \(X_t\) to describe a random variable, \(x_t\) for the corresponding observation, and \(\hat{x}_t\) for an estimated/imputed value. The matrix of \(x_{t,d}\) available at a given time \(t^*\) is referred to as a reporting matrix. In the case where all \(t+d>t^*\) have yet to be observed (e.g. \(t^*\) is the current time), this reporting matrix is referred to as the reporting triangle, with all values in the bottom right corner of the triangle being missing, except for the first entry at \(x_{t=t*, d = 0}\).
\(d = 0\) | \(d = 1\) | \(d=2\) | \(...\) | \(d= D-1\) | \(d= D\) | |
---|---|---|---|---|---|---|
\(t=1\) | \(x_{1,0}\) | \(x_{1,1}\) | \(x_{1,2}\) | \(...\) | \(x_{1,D-1}\) | \(x_{1, D}\) |
\(t=2\) | \(x_{2,0}\) | \(x_{2,1}\) | \(x_{2,2}\) | \(...\) | \(x_{2,D-1}\) | \(x_{2, D}\) |
\(t=3\) | \(x_{3,0}\) | \(x_{3,1}\) | \(x_{3,2}\) | \(...\) | \(x_{3,D-1}\) | \(x_{3, D}\) |
\(...\) | \(...\) | \(...\) | \(...\) | \(...\) | \(...\) | \(...\) |
\(t=t^*-1\) | \(x_{t^*-1,0}\) | \(x_{t^*-1,1}\) | \(x_{t^*-1,,2}\) | \(...\) | \(x_{t^*-1,,D-1}\) | \(x_{t^*-1,D}\) |
\(t=t^*\) | \(x_{t^*,0}\) | \(x_{t^*,1}\) | \(x_{t^*,2}\) | \(...\) | \(x_{t^*,D-1}\) | \(x_{t^*, D}\) |
Throughout this document and package, we will refer to these matrices, as well as their corresponding vectors, using the following table. In this table, “point” refers to a point estimate. When not indicated, we are referring to a probabilistic draw from an observation model.
Data Structure | Observations Only | Mixed (Obs + Predictions) | Pure Predictions Only |
---|---|---|---|
Matrix Format |
reporting_matrix (complete with observations)reporting_triangle (contains NAs for missing observations) |
point_nowcast_matrix nowcast_matrix
|
point_pred_matrix pred_matrix (contains NAs in elements where there were observations) |
List Format |
reporting_matrix_list (complete)reporting_triangle_list (contains NAs for missing observations) |
nowcast_matrix_list |
pred_matrix_list |
Vector Format (summed across delays) | observed_cases |
point_nowcast nowcast
|
point_pred pred
|
DataFrame Format | - | nowcast_df |
pred_df |
For example, we refer to the matrix with imputed point estimates for all \(t+d>t^*\) as a point nowcast matrix, a matrix with a complete set of observations for all elements as a reporting matrix, and a matrix with only the predictions as a point prediction matrix.
In package documentation, we will generally use a reporting triangle to refer to a diagonal reporting triangle, which is the special case of a reporting triangle where the reporting delays and reference times are indexed on the same scale. An example of a diagonal reporting triangle is one with daily reporting of daily data. The package also supports what we refer to as a ragged reporting triangle, an example of which would be daily reporting of data referenced only once per week.
We will use the following to abbreviations to shorten names in the code:
Long Name | Code Abbreviation |
---|---|
reporting | rep |
observed | obs |
incomplete | inc |
matrix | mat |
point | pt |
error | err |
2 Point estimate of the delay distribution
2.1 Estimate of the delay distribution from a reporting matrix
We can use a reporting matrix to compute an empirical estimate of the delay distribution, \(\pi(d)\). The empirical delay distribution, \(\pi(d)\) can be computed directly from the reporting matrix \(X\)
\[ \pi(d)= \frac{\sum_{t=1}^{t=t^*} X_{t,d}}{\sum_{d'=0}^{D} \sum_{t=1}^{t=t^*} X_{t,d'}} \]
Where the numerator is the sum of all the observations across reference times \(t\) for a particular delay \(d\) of interest, and the denominator is the sum across all reference times \(t\) and indexed delays in the reporting matrix \(d'\).
2.2 Estimate of the delay distribution from a reporting triangle
In the case where we have missing values in the bottom right (i.e. we have a reporting triangle), we need to use the multiplicative model to generate a point nowcast matrix containing a mixture of observed and imputed values. Then we can compute the delay distribution as described above for the reporting matrix case.
The multiplicative model works by iteratively “filling in” the reporting triangle starting from the bottom left, and moving column by column from left to right until the bottom right of the triangle is filled in.

Figure 2.1: Visual description of the iterative “completing” of the reporting triangle, moving from left to right and bottom to top. In this cases, we are imputing \(x_{t=6, d = 2}\) and \(x_{t=5, d= 2}\) assuming that the ratio between \(x_{t=1:4, d = 2}\) (block top), and \(x_{t=1:4, d=0:1}\) (block top left) holds for for \(x_{t=5:6, d = 2}\) (block bottom) and \(x_{t=5:6, d = 0:1}\) (block bottom left). In this example, \(\hat{x}_{t=6, d = 1}\) has already been imputed using the same approach, and we treat it as known going forward. This process is repeated across the reporting triangle to estimate all values outlined in the dashed lines.
The method requires at least one observation, at delay \(d=0\) for the most recent reference time, located at the bottom left of the reporting triangle in Figure 2.1 above. The method assumes that the values at each delay \(d\) for the recent times, \(t\), will consist of the same proportion of the values previously reported for earlier times \(t\). To fill in the missing values for each column \(d\), we sum over the rectangle of completely observed reference dates for all \(d-1\) columns (block top left) and sum over the column of completely observed reference dates for all of the entries in column \(d\) (block left). The ratio of these two sums is assumed to be the same in the missing entries in column \(d\), so we use the entries observed up to \(d-1\) for each incomplete reference date (block bottom left), and scale by this ratio to get the missing entries in column \(d\). This process is repeated for each delay from \(d\) up to the maximum delay \(D\). At each iteration an additional reference time entry is computed as the delay \(d\) increases.
The delay distribution is then estimated from the filled in reporting matrix, using the same algorithm as described above for the case of the complete reporting square.
The get_delay_estimate()
function ingests either a reporting matrix, an incomplete reporting matrix, or a reporting triangle and uses the last n
rows to compute an empirical delay probability mass function (returning a simplex vector indexed starting at delay 0).
3 Generation of a point nowcast matrix from a delay distribution and a reporting triangle
To “fill in” the reporting triangle from the delay distribution, we need to estimate the expected total number of eventual observed cases \(\hat{x}_t\), for each reference time \(t\). Let \(z\) be the sum over all delays \(d\) that have already been observed (up until \(t^*-t\)), such that \(z =\sum_{d=1}^{d=t^*-t} x_{t,d}\) and \(\delta\) be the cumulative sum of the delay distribution, \(\pi(d)\) up until \(d = t^*-t\) such that \(\delta = \sum_{d=1}^{d=t^*-t} \pi(d)\). By assuming that \(z \sim Bin(x_t, \delta)\) and \(x_t \sim Unif(0, \inf)\), it can be shown that the expected value of \(x_t\), the total number of reported cases on reference time \(t\), can be written as:
\[ E(x_t | z, \delta) = \hat{x}_t = \frac{z + 1 - \delta}{\delta} \]
Then we can compute \(\hat{x}_{t,d}\) directly using the \(d\)th element of \(\pi(d)\)
\[ \hat{x}_{t,d} = \pi(d) \times \hat{x}_t \]
Where the number of reports at timepoint \(t\) with delay \(d\) is the product of the the expected total reports, \(\hat{x}_t\) and the proportion expected at that particular delay \(d\), \(\pi(d)\).
The apply_delay()
function ingests a reporting triangle or an incomplete reporting matrix and a delay PMF and returns a point nowcast matrix.
4 Estimate of dispersion
To estimate the uncertainty in the nowcasts, we use past nowcast errors and assume a negative binomial observation model.
4.1 Generation of retrospective reporting triangles
We describe a method which generates retrospective reporting triangles to replicate what would have been available as of time \(t^*=s^*\), where \(s^* = t^*-m\) for \(m = 1, 2, ... M\) to generate \(M\) retrospective reporting triangles.
To generate the set of \(M\) reporting triangles, we work backwards from most recent to oldest, removing the last \(m\) rows of the existing reporting triangle, to generate \(M\) truncated reporting triangles. These can be generated from a reporting triangle with the function truncate_triangles()
which ingests a single reporting triangle and returns a list of n
truncated reporting containing only observations and missing values if present. We then replace the bottom right triangle with NAs, assuming these would not have been observed as of \(s^*\), using the function generate_triangles()
which returns a list of n
retrospective reporting triangles, again in order from most recent to oldest.
The method uses each retrospective reporting triangle to re-estimate a delay distribution using the \(N\) preceding reference times of the retrospective reporting triangle before \(s^*\), and recomputes a retrospective point nowcast matrix, for \(M\) realizations of the retrospective reporting triangle (so \(M\) different \(s^*\) values).
Thus in order to estimate uncertainty using \(N\) reference times, the total training volume must meet or exceed \(N+M\) so that the oldest retrospective nowcast dataset at \(s^* = t^*-M\) can use \(N\) reference times for its point nowcast.
4.2 Generation of retrospective point nowcast matrices
From the \(M\) reporting triangles, we apply the method described above to estimate a delay distribution from a reporting triangle and generate a point nowcast for each reporting triangle, to generate \(M\) point nowcasts. The function generate_point_nowcasts()
ingests the list of reporting triangles, estimates a delay distribution for each, and generates a list of point nowcast matrices.
4.3 Fit predicted point nowcast vectors and observed counts to a negative binomial observation model at each forecast horizon
We quantify the uncertainty at the target quantity level, which is assumed, by default, to be the final count at each reference time, summed across reporting delays.
At each retrospective nowcast time \(s^*\), we compute the predicted and corresponding observed nowcast at each forecast horizon \(j = 1, ..., D\) by summing across the reporting delays for all delays \(d\) that have been observed as of time \(t^*\), as indicated below by the indicator function. We define the predicted nowcast for forecast horizon \(j\) at retrospective nowcast time \(s^*\) \[ \hat{X}_{t-j}(s^*) = \sum_{d=0}^{D} \hat{X}_{t-j,d}(s^*) \times I(t-j+d \leq t^*) \] and likewise the observed nowcast for forecast horizon \(j\) at retrospective nowcast time \(s^*\) \[ X_{t-j}(s^*) = \sum_{d=0}^{D} X_{t-j,d}(s^*) \times I(t-j+d \leq t^*) \]
This generates \(M\) pairs of predicted nowcasts and observed nowcasts for each forecast horizon \(j = 1, ..., D\). We assume that the observed nowcasts \(X_{t-j}\) follow a negative binomial observation model with a mean of \(\hat{X}_{t-j}(t^*-j)\)
\[ X_{t-j} | \hat{X}_{t-j}(t^* - j) \sim NegBin(\mu = \hat{X}_{t-j}(t^*-j) + 0.1, \phi = \phi_{t^*-t}) \] for all \(s^* = 1, ..., M\).
We add a small number (0.1) to the mean to avoid an ill-defined negative binomial.
The function estimate_dispersion()
ingests a list of reporting triangles representing the data available for each reference time, a list of point nowcast matrices representing the nowcasts for each reference time, and a list of truncated reporting matrices representing the observations for each reference time. It returns a vector of negative binomial dispersion parameters indexed starting at forecast horizon \(j = 1\).
5 Generate probabilistic nowcasts
Using the dispersion parameters for each forecast horizon, \(\phi(j),\) for \(j = 1,...D\), we can generate probabilistic nowcast matrices by drawing samples from the negative binomial:
\[ X_{t^*-j} \sim NegBin(\mu = \hat{X}_{t^*-j}, \phi = \phi(j)) \]
We can sample for any number of draws, and then use the draws to compute any desired quantiles to summarize the outputs.
The function get_nowcast_pred_draws()
ingests a point nowcast prediction matrix (containing only the predicted elements, not the observations in the reporting triangle), the dispersion parameters, and the number of draws to sample and generates a dataframe of predicted point nowcasts for each reference time. We can then join this to the original data in long form by reference date and reporting date in order to generate the probabilistic nowcasts as the sum of the predicted total count from each draw and the observed count for each reference time as of the final reference time.