Mathematical methods for `baselinenowcast`
Source:vignettes/model_definition.Rmd
model_definition.Rmd
1 Overview
The baselinenowcast
model, initially developed as a reference for the COVID-19 hospital admissions nowcasting challenge in Germany (2021-2022, Wolffram et al, utilises reporting triangles of preliminary case counts and their delays. It applies a multiplicative approach, using empirically observed historical delay distributions to estimate yet-to-be-observed cases. Users can specify whether delay distributions are estimated from the latest data, fully observed past data, or data from different strata. The model produces point estimates of nowcasts by “filling in” the reporting triangle. Probabilistic nowcasts are generated using a negative binomial model with means from the point nowcast and dispersion parameters estimated from past nowcast errors. The dispersion parameters can also be flexibly sourced from current data, historical data, or alternative settings.
1.1 Notation
We denote \(X_{t,d}, d = 0, .., D\) as the number of cases occurring on time \(t\) which appear in the dataset with a delay of \(d\). For example, a delay \(d = 0\) means that a case occurring on day \(t\) arrived in the dataset on day \(t\), or on what is considered to be the first possible report date in practice. We only consider cases reporting within a maximum delay \(D\). The number of cases reporting for time \(t\) with a delay of at most \(d\) can be written as:
\[X_{t, \le d} = \sum_{i=0}^d X_{t,i} \]
Such that \(X_t = X_{t, \le D}\) is the “final” number of reported cases on time \(t\). Conversely, for \(d < D\)
\[X_{t,>d} = \sum_{i = d+1} ^{D} X_{t,i}\]
is the number of cases still missing after \(d\) delay. We refer to \(X_t\) to describe a random variable, \(x_t\) for the corresponding observation, and \(\hat{x}_t\) for an estimated/imputed value. The matrix of \(x_{t,d}\) available at a given time \(t^*\) is referred to as a reporting matrix. In the case where all \(t+d>t^*\) have yet to be observed (e.g. \(t^*\) is the current time), this reporting matrix is referred to as the reporting triangle, with all values in the bottom right corner of the triangle being missing, except for the first entry at \(x_{t=t*, d = 0}\).
\(d = 0\) | \(d = 1\) | \(d=2\) | \(...\) | \(d= D-1\) | \(d= D\) | |
---|---|---|---|---|---|---|
\(t=1\) | \(x_{1,0}\) | \(x_{1,1}\) | \(x_{1,2}\) | \(...\) | \(x_{1,D-1}\) | \(x_{1, D}\) |
\(t=2\) | \(x_{2,0}\) | \(x_{2,1}\) | \(x_{2,2}\) | \(...\) | \(x_{2,D-1}\) | \(x_{2, D}\) |
\(t=3\) | \(x_{3,0}\) | \(x_{3,1}\) | \(x_{3,2}\) | \(...\) | \(x_{3,D-1}\) | \(x_{3, D}\) |
\(...\) | \(...\) | \(...\) | \(...\) | \(...\) | \(...\) | \(...\) |
\(t=t^*-1\) | \(x_{t^*-1,0}\) | \(x_{t^*-1,1}\) | \(x_{t^*-1,,2}\) | \(...\) | \(x_{t^*-1,,D-1}\) | \(x_{t^*-1,D}\) |
\(t=t^*\) | \(x_{t^*,0}\) | \(x_{t^*,1}\) | \(x_{t^*,2}\) | \(...\) | \(x_{t^*,D-1}\) | \(x_{t^*, D}\) |
Throughout this document and package, we will refer to these matrices, as well as their corresponding vectors, using the following table. In this table, “point” refers to a point estimate. When not indicated, we are referring to a probabilistic draw from an observation model.
Data Structure | Observations Only | Mixed (Obs + Predictions) | Pure Predictions Only |
---|---|---|---|
Matrix Format |
reporting_matrix (complete)reporting_triangle (incomplete)incomplete_reporting_matrix
|
point_nowcast_matrix nowcast_matrix
|
point_pred_matrix pred_matrix
|
List Format |
reporting_matrix_list (complete)reporting_triangle_list (incomplete)incomplete_reporting_matrix_list
|
nowcast_matrix_list |
pred_matrix_list |
Vector Format | observed_cases |
point_nowcast nowcast
|
point_pred pred
|
DataFrame Format | - | nowcast_df |
pred_df |
For example, we refer to the matrix with imputed point estimates for all \(t+d>t^*\) as a point nowcast matrix, a matrix with a complete set of observations for all elements as a reporting matrix, and a matrix with only the predictions as a point prediction matrix.
We will use the following to abbreviations to shorten names in the code:
Long Name | Code Abbreviation |
---|---|
reporting | rep |
observed | obs |
incomplete | inc |
matrix | mat |
point | pt |
error | err |
2 Point estimate of the delay distribution
2.1 Estimate of the delay distribution from a reporting matrix
We can use a reporting matrix to compute an empirical estimate of the delay distribution, \(\pi(d)\). The empirical delay distribution, \(\pi(d)\) can be computed directly from the reporting matrix \(X\)
\[ \pi(d)= \frac{\sum_{t=1}^{t=t^*} X_{t,d}}{\sum_{d=0}^{D} \sum_{t=1}^{t=t^*} X_{t,d}} \]
Where the numerator is the sum of all the observations across reference times \(t\) for a particular delay \(d\), and the denominator is the sum across all reference times \(t\) and delays \(d\).
2.2 Estimate of the delay distribution from a reporting triangle
In the case where we have missing values in the bottom right (i.e. we have a reporting triangle), we need to use the multiplicative model to generate a point nowcast matrix containing a mixture of observed and imputed values. Then we can compute the delay distribution as described above for the reporting matrix case.
The multiplicative model works by iteratively “filling in” the reporting triangle starting from the bottom left, and moving column by column from left to right until the bottom right of the triangle is filled in.

The method requires at least one observation, at delay \(d=0\) for the most recent reference time, located at the bottom left of the reporting triangle in Figure ?? above. The method assumes that the values at each delay \(d\) for the recent times, \(t\), will consist of the same proportion of the values previously reported for earlier times \(t\). To fill in the missing values for each column \(d\), we sum over the rectangle of completely observed reference times for all \(d-1\) columns (block top left) and sum over the column of completely observed reference times for all of the entries in column \(d\) (block left). The ratio of these two sums is assumed to be the same in the missing entries in column \(d\), so we use the entries observed up to \(d-1\) for each incomplete reference time (block bottom left), and scale by this ratio to get the missing entries in column \(d\). This process is repeated for each delay from \(d\) up to the maximum delay \(D\). At each iteration an additional reference time entry is computed as the delay \(d\) increases.
The delay distribution is then estimated from the filled in reporting matrix, using the same algorithm as described above for the case of the complete reporting square.
The get_delay_estimate()
function ingests either a reporting matrix, an incomplete reporting matrix, or a reporting triangle and uses the last n
rows to compute an empirical delay probability mass function (returning a simplex vector indexed starting at delay 0).
3 Generation of a point nowcast matrix from a delay distribution and a reporting triangle
To “fill in” the reporting triangle from the delay distribution, we need to estimate the expected total number of eventual observed cases \(\hat{x}_t\), for each reference time \(t\). Let \(z\) be the sum over all delays \(d\) that have already been observed (up until \(t^*-t\)), such that \(z =\sum_{d=1}^{d=t^*-t} x_{t,d}\) and \(\delta\) be the cumulative sum of the delay distribution, \(\pi(d)\) up until \(d = t^*-t\) such that \(\delta = \sum_{d=1}^{d=t^*-t} \pi(d)\). By assuming that \(z \sim Bin(x_t, \delta)\) and \(x_t \sim Unif(0, \inf)\), it can be shown that the expected value of \(x_t\), the total number of reported cases on reference time \(t\), can be written as:
\[ E(x_t | z, \delta) = \hat{x}_t = \frac{z + 1 - \delta}{\delta} \]
Then we can compute \(\hat{x}_{t,d}\) directly using the \(d\)th element of \(\pi(d)\)
\[ \hat{x}_{t,d} = \pi(d) \times \hat{x}_t \]
Where the number of reports at timepoint \(t\) with delay \(d\) is the product of the the expected total reports, \(\hat{x}_t\) and the proportion expected at that particular delay \(d\), \(\pi(d)\).
The apply_delay()
function ingests a reporting triangle or an incomplete reporting matrix and a delay PMF and returns a point nowcast matrix.
4 Estimate of dispersion
To estimate the uncertainty in the nowcasts, we use past nowcast errors and assume a negative binomial observation model.
4.1 Generation of retrospective reporting triangles
We describe a method which generates retrospective reporting triangles to replicate what would have been available as of time \(t^*=s^*\), where \(s^* = t^*-m\) for \(m = 1, 2, ... M\) to generate \(M\) retrospective reporting triangles.
To generate the set of \(M\) reporting triangles, we simply remove the last \(m\) rows of the existing reporting triangle (or reporting matrix), to generate \(M\) truncated incomplete reporting matrices. These can be generated from a reporting triangle or matrix with the function truncate_triangles()
which ingests a single reporting triangle (or matrix) and returns a list of n
truncated (potentially incomplete) reporting matrices containing only observations and missing values if present. We then replace the bottom right triangle of each matrix with NAs, assuming these would not have been observed as of \(s^*\), using the function generate_triangles()
which returns a list of n
retrospective reporting triangles. The method uses each retrospective reporting triangle to re-estimate a delay distribution using the \(N\) preceding rows of the reporting triangle before \(s^*\), and recomputes a retrospective nowcast, for \(M\) realizations of the retrospective reporting triangle (so \(M\) different \(s^*\) values).
4.2 Generation of retrospective point nowcast matrices
From the \(M\) reporting triangles, we apply the method described above to estimate a delay distribution from a reporting triangle and generate a point nowcast for each reporting triangle, to generate \(M\) point nowcasts. The function generate_point_nowcasts()
ingests the list of reporting triangles, estimates a delay distribution for each, and generates a list of point nowcast matrices.
4.3 Fit point nowcast matrices and observed values to a negative binomial observation model at each delay
We then take the list of point nowcast matrices, and the list of truncated incomplete reporting matrices (these do not necessarily contains NAs on the bottom right, the bottom right could be entirely or partially observed). For each delay \(d\) we identify the overlapping set of matrix elements that were imputed retrospectively and matrix elements that had been observed as of the most recent time point.
For each delay \(d = 1, ..., D\) we assume that the observed values, \(X_{s^*-d, >d}\) follow a negative binomial observation model with a mean of \(\hat{x}_{s^*-d}\):
\[ X_{s^*-d,>d} | \hat{x}_{s^*-d, >d}(s*) \sim NegBin(\mu = \hat{x}_{s^*-d} + 0.1, \phi = \phi_d) \]
We add a small number (0.1) to the mean to avoid an ill-defined negative binomial. We note that to perform all these computations, data snapshots from at least \(N + M\) past observations, or rows of the original reporting triangle (or matrix), are needed. This estimate of the uncertainty accounts for the empirical uncertainty in the point estimate of the delay distribution over time.
The function estimate_dispersion()
ingests a list of truncated reporting matrices representing the observations, and the list of point nowcast matrices, and returns a vector of negative binomial dispersion parameters indexed starting at delay \(d = 1\).
5 Generate probabilistic nowcast matrices
Using the dispersion parameters for each delay, \(\phi(d),\) for \(d = 1,...D\), we can generate probabilistic nowcast matrices by drawing samples from the negative binomial:
\[ X_{t,d} \sim NegBin(\mu = \hat{x}_{t,d}, \phi = \phi(d)) \]
We can sample for any number of draws, and then use the draws to compute any desired quantiles to summarize the outputs.
The function add_obs_errs_to_pt_nowcast_mat()
ingests a point nowcast matrix, the dispersion parameters, and the number of draws to sample and generates a list of point nowcast matrices. This can then be summarized across delays by passing the list to nowcast_matrix_list_to_df()
and summarised by reference time using aggregate_by_ref_time()
.