Skip to content

Using covariates as filters

HankHerr-NOAA edited this page Jul 26, 2024 · 7 revisions

Table of Contents

What is a covariate?

A covariate is a separate dataset that varies alongside the main evaluation datasets (observed, predicted and baseline) and is used to filter the evaluation pairs, including the observed and predicted pairs and the observed and baseline pairs. For example, observations of precipitation may be used to conduct an evaluation of temperature forecasts under wet conditions only, i.e., when the precipitation is greater than zero, or an instrument threshold, at the same valid times, locations and timescale as the temperature forecasts. In this example, the “covariate” dataset is the observed precipitation.

What covariates can I declare?

As of WRES v6.23, covariate datasets may be declared for:

  • One or more variables, where each variable has a unique name (two or more covariates with the same name are inadmissible); and
  • Each covariate can be cross-referenced with (compared to) the main evaluation datasets, meaning at the same locations, valid times and time scales, possibly after rescaling; and
  • The covariate time-series are all “observation-like”. In other words, the time-series do not contain any forecast reference times. For example, observations, model simulations and model analyses are all admissible.

The name of each variable is required when:

  • Declaring two or more covariates; or
  • Using data sources that contain two or more variables.

Otherwise, the variable name can be inferred from the data.

How are covariate datasets cross-referenced with the evaluation pairs?

When cross-referencing the covariate datasets with the evaluation pairs by location or geographic feature, it is required that each covariate uses the same feature authority as one or more or the observed, predicted and baseline datasets. For example, the United States Geological Survey (USGS) uses numeric site codes (the USGS is a feature authority). A covariate cannot use a novel feature authority because the covariate feature names are not declared explicitly. By default, it is assumed that the covariate feature names use the same authority as the observed data. If this assumption is incorrect, then the feature_authority of the covariate must be declared explicitly (and, likewise, the feature_authority of the corresponding observed, predicted or baseline dataset must be declared explicitly).

When cross-referencing the covariate datasets with the evaluation pairs by valid time, it is required that each covariate has the same time scale period as the evaluation pairs. When the time scale period is different, the software will rescale (upscale) the covariate (and any other datasets) to the desired evaluation time_scale (with the same consumptions and constraints as upscaling more generally: see Time Scale and Rescaling Time Series). However, the time scale function may be declared separately for each covariate using the rescale_function. For example, when evaluating average daily streamflow conditionally upon the total daily precipitation (the covariate) exceeding 0 MM, then the rescale_function can be declared as a total to ensure that the precipitation covariate is properly rescaled when required (e.g., if the time-series data of observed precipitation contains 6-hourly totals).

What filters can I impose on covariates?

Covariates are used to filter the evaluation pairs, including the pairs of observed and predicted values and, where applicable, the pairs of observed and baseline values. In the absence of an explicit filter, the covariate will select evaluation pairs for only those valid times when the covariate is also defined. As of version 6.23, the following, explicit, filters are also supported for each covariate (one or both may be used):

  • The minimum value of the covariate; and
  • The maximum value of the covariate.

An explicit filter will additionally select only those evaluation pairs where the filter condition is met. In other words, the evaluation will include only those pairs where every covariate is defined (at the corresponding valid time of the pair) and the value of each covariate meets the minimum and/or maximum constraints imposed upon it.

For example, to evaluate streamflow when the observed air temperature is at or below freezing, a temperature covariate should be declared with a maximum value of 0°C or 32°F. The unit of each filter corresponds to the native unit in which the covariate is supplied. As of v6.23, it is not possible to transform the unit of a covariate prior to filtering. For the same reason, it is not possible to declare time-series sources for a single covariate with a mixture of measurement units. It is also not possible to declare a minimum or maximum value that varies with location. Future iterations of the software may relax these constraints.

How do I declare covariates?

See How do I declare covariate datasets? in the Declaration Language wiki.

Clone this wiki locally