Skip to content

Time Scale and Rescaling Time Series

HankHerr-NOAA edited this page Jul 26, 2024 · 7 revisions

Table of Contents

Time Scale and Rescaling Time Series

What is a time scale?

The time scale of a measurement is the interval or “control volume” over which the measurement is valid or representative. For example, a tipping bucket rain gauge records the time at which a control volume or bucket of rainfall (e.g., 0.254mm) empties or tips. In this case, the time scale of the measurement is the total precipitation (0.254 mm) that occurred during the period between the last reset and the current tip.

Every value or measurement within a time-series has a time scale, including time-series that contain forecasts, simulations, observations or any other data type.

An “instantaneous” time scale is a special type of time scale that represents a measurement over a small, but unspecified, duration, known as an “instant”.

What is rescaling?

The rescaling of a time-series is concerned with changing the time scale of the measurements within it. The time scale may increase (become larger or more aggregated), which is known a “upscaling”, or decrease (become smaller or less aggregated), which is known as “downscaling”.

In general, downscaling is less straightforward than upscaling because downscaling relies on a model of the behavior of the time-series at the smaller scale of interest, in order to determine how the values at the larger scale should be distributed (over time) at the smaller scale.

In many cases, upscaling simply involves the application of an arithmetic function to existing measurements. For example, the accumulated precipitation within a 24-hour period is easily (and exactly) calculated from the accumulated precipitation amounts within four, 6-hour, periods that are adjacent to each other (and that do not overlap). However, some upscaling problems are much more complicated. For example, if a sequence of precipitation measurements have control volumes that overlap in time, then an accumulation should only consider the non-overlapping periods. In order to apportion the correct precipitation amount to each non-overlapping periods, a model is needed for the distribution of measurements over these periods (e.g., a constant precipitation rate within each period).

As of WRES version 5.17, only “simple upscaling” is supported. Downscaling and “complex” upscaling are not supported. Further details are provided in What assumptions and limitations apply when rescaling?

How does the WRES represent a time scale?

The time scale of a measurement is typically described by three parameters:

  1. The period or duration over which a measurement is made or to which the measurement applies. For example, a daily-average streamflow has a period of “one day”;
  2. The unit associated with the period. For example, hours; and
  3. The function or statistic that is represented by the measurement. For example, a daily-average streamflow has a function that is a “mean average”.

The period associated with the time scale spans a “right closed” interval, which means that the value begins at, but does not include, the period prior to the valid time of the measurement and ends at, and includes, the valid time of the measurement. For example, a measurement that has a valid time of 2003-01-01T12:00:00Z and a time scale period of 3 hours spans the right-closed interval, (2003-01-01T09:00:00Z,2003-01-01T12:00:00Z]. In other words, the measurement begins at the instant immediately after 2003-01-01T09:00:00Z and ends at precisely 2003-01-01T12:00:00Z.

For an example of how to declare a time scale, see: How do I declare time scale information?

Optionally, a time scale may be described with a fixed time interval, such as 1 April through 31 July or 90 days after 1 April. This may include:

  1. The minimum_month (an integer between 1 and 12) and minimum_day (an integer between 1 and 31). For example, an minimum_month of 4 and an minimum_day of 1 denotes denotes 1 April; and
  2. The maximum_month (an integer between 1 and 12) and maximum_day (an integer between 1 and 31).

In order to describe a fixed duration that is bounded by a month-day, one of the above can be used together with the period.

For an example of how to declare a fixed interval, see: How do I declare a desired time scale that spans a fixed interval?

When declaring an minimum_month and minimum_day, the day is assumed to begin (and include) 0Z. When declaring a maximum_month and maximum_day, the day is assumed to end at one instant before 0Z on the following day. Thus, when describing a time interval of 1 April through 31 July, the interval will begin at 0Z on 1 April and will end just before 0Z on 1 August.

How do I declare time scale information?

The time scale information may be declared in two separate contexts:

  1. To describe a time-series dataset that will be ingested from a data source; and
  2. To express the time scale at which the evaluation should be conducted; in other words, the time scale of the paired data.

The declaration is the same for both cases, but the context differs.

An evaluation may contain up to one time_scale for each side of data (observed, predicted or baseline) and up to one time_scale for the evaluation itself.

By way of example, when the observed data sources represent a “total” with a period of “6 hours”, this may be declared as follows (with context):

observed:
  sources: some_file.csv
  time_scale:
    function: total
    period: 6
    unit: hours

Similarly, when the evaluation time scale is a “total” within a period of “24 hours”, this may be declared as follows (with context):

observed: some_file.csv
predicted: some_other_file.csv
time_scale:
  function: total
  period: 24
  unit: hours

How do I declare an instantaneous time scale?

Any time scale period that is 1 minute or less is interpreted by WRES as “instantaneous”, regardless of the time scale function. If the function is declared, it will be ignored. Thus, the following may be used to declare an instantaneous time scale for the observed dataset:

observed:
  sources: some_file.csv
  time_scale:
    period: 1
    unit: minute

How do I declare an evaluation time scale that spans a fixed interval?

The following declares an evaluation time scale that spans the interval from 0Z on 1 April through the end of the day on 31 July (i.e., one instant before 0Z on 1 August):

observed: some_file.csv
predicted: some_other_file.csv
time_scale:
  function: mean
  minimum_month: 4
  minimum_day: 1
  maximum_month: 7
  maximum_day: 31

The following declares an evaluation time scale that spans a 90 day period that begins at 0Z on 1 April:

observed: some_file.csv
predicted: some_other_file.csv
time_scale:
  function: mean
  period: 90
  unit: days
  minimum_month: 4
  minimum_day: 1

The following declares an evaluation time scale that spans a 31 day period that ends just before 0Z on 1 February (in other words, the month of January):

observed: some_file.csv
predicted: some_other_file.csv
time_scale:
  function: mean
  period: 30
  unit: days
  maximum_month: 1
  maximum_day: 31

When do I need to declare a time scale for a dataset?

Not very often. More specifically, it is only required under the following circumstances:

  1. The time scale is not provided by some of the time series data formats that are declared for ingest. For further information, see: Which time-series data formats support time scale information? ; and
  2. The declaration includes an explicit evaluation time_scale.

In these circumstances, the WRES cannot know whether the evaluation time scale of the pairs can be achieved unless it knows the existing time scale of all measurements.

When do I need to declare an evaluation time_scale?

When the pairs of time series values should be evaluated at a larger time scale than one or all of the time series data sources, then an evaluation time_scale may be needed.

When the time scale the paired datasets (e.g., observed and predicted and/or observed and baseline) is fully qualified and one of the sides of data is at a smaller time scale, then pairing will be conducted, automatically, at the so-called “Least Common Scale”. The LCS is simply the Least Common Multiple of the period associated with each side of data. For example, if the period of the observed data is 4 hours and the period of the predicted data is “6 hours”, then the LCS is 12 hours because that is the smallest (positive) value that divides exactly by 4 (3) and 6 (2).

Thus, an evaluation time_scale is only required when upscaling is needed and the LCS is inadequate. In most cases, the LCS is adequate.

Can I control how values overlap when declaring an evaluation time_scale?

Alongside the evaluation time_scale, an additional piece of information may be supplied to control how adjacent measurements interleave or overlap. By default, measurements at the evaluation time_scale do not overlap. For example, consider the following precipitation time-series, which contains eight measurements with a time_scale whose period is 6 hours and whose function is a total or accumulation:

2021-01-01T00:00:00Z, 0.5
2021-01-01T06:00:00Z, 1.3
2021-01-01T12:00:00Z, 1.2
2021-01-01T18:00:00Z, 0.6
2021-01-02T00:00:00Z, 0.3
2021-01-02T06:00:00Z, 0.0
2021-01-02T12:00:00Z, 0.0
2021-01-02T18:00:00Z, 0.1

When declaring an evaluation time_scale that contains a period of 24 hours, these values will be upscaled as follows:

2021-01-01T18:00:00Z, 3.6
2021-01-02T18:00:00Z, 0.4

To provide further control over how the upscaling is performed, a pair_frequency may be declared, which describes the duration between each new upscaled measurement. For example:

observed: some_file.csv
predicted: some_other_file.csv
time_scale:
  function: total
  period: 24
  unit: hours
pair_frequency:
  period: 6
  unit: hours

In this example, a 24-hour accumulation will be computed every 6 hours as follows:

2021-01-01T18:00:00Z, 3.6
2021-01-02T00:00:00Z, 3.4
2021-01-02T06:00:00Z, 2.1
2021-01-02T12:00:00Z, 0.9
2021-01-02T18:00:00Z, 0.4

Can I use rescaling to convert from flow to volume?

Yes. Some problems that involve a change in the dimension of the measurement units can be solved by rescaling, specifically upscaling.

A dimension is a physical quantity that can be measured, such as volume or time. Each measurement unit is composed of one or more dimensions. For example, a volumetric flow in m³/s is composed of a volume in over a time span of one second or s. The WRES understands all units within the Unified Code for Units of Measure (UCUM), among others. For more information on measurement units in WRES, including the UCUM standard, see: Units of Measurement.

In some cases, the WRES allows for a change in measurement unit dimensions. As of v6.8, it is possible to convert from:

  1. Volumetric flow ([L]³/[T]) to volume ([L]³);
  2. Mass flow ([M]/[T]) to mass ([M]); and
  3. Speed ([L]/[T]) to distance ([L]).

All of these conversions involve a time integration. For example, converting from a volumetric flow in [L]³/[T] to a volume in [L]³ involves summing or integrating over time. This problem can be solved in WRES by an appropriate combination of the measurement unit and the evaluation time_scale.

For example, if the existing measurements are instantaneous flows in m³/s and the goal is to conduct an evaluation of volumes over an extended period (e.g., 1 April through 31 July), then the following declaration is admissible:

observed: some_file.csv
predicted: some_other_file.csv
unit: m3
time_scale:
  function: total
  minimum_month: 4
  minimum_day: 1
  maximum_month: 7
  maximum_day: 31

In other words, evaluate the total volume in from 1 April through 31 July. In practice, the existing measurement units could be any volumetric flow units, such as ft³/s and the desired unit could be any volume unit, such as [acr_br].[ft_i] (acre-feet). In all valid cases, the unit conversion and time integration will be performed by WRES. The WRES accepts any UCUM units (as well as several common, but irregular, shorthands, such as CFS for ft³/s). Valid UCUM units can be found and tested here: https://ucum.nlm.nih.gov/ucum-lhc/demo.html.

The WRES only allows for upscaling with a change in measurement unit dimensions when the existing time scale is instantaneous or represents a mean over the existing scale period and when the desired function is a total or accumulation. For more details on limitations, see: What assumptions and limitations apply when rescaling?

Which time-series data formats support time scale information?

Among the time-series data formats and data services that are supported by the WRES 5.17, the following formats and services allow some or all of the time scale information to be supplied:

  • The WRES CSV format (supports all information, optionally, i.e., both a time_scale_period and a time_scale_function. For further details, see: Format Requirements for Comma Separated Values (CSV) Files );
  • The PI-XML format (in principle, supports all information, but typically only distinguishes between “instantaneous” and “accumulated” measurements. The function is obtained from the type field in the format header and the period is obtained from the timeStep);
  • The USGS National Water Information System (NWIS) Instantaneous Values (IV) service (this will only supply time-series measurements that have an instantaneous time scale); and
  • The Water Resources Data Service (WRDS), which includes services for a variety of NWS time-series data sources, such as the National Water Model and the Advanced Hydrologic Prediction System (in principle, supports all information via SHEF-encoded measurements, as described in the SHEF manual. The period is obtained from the duration and the function is obtained from the physicalElement code. However, the WRES currently only supports a physicalElement code of “QR”, which is interpreted as a mean average).

No other data services or formats currently provide any time scale information and some formats (e.g., the WRES CSV format) only support it optionally.

Does the WRES make any simplifying assumptions about time scale?

Yes, several. The WRES makes some simplifying assumptions about the time scale of the time-series measurements that it ingests and operates upon, namely that:

  • All of the time-series associated with a given orientation or side of data, such as the observed data, contain measurements with a single time scale;
  • All measurements that are classified as “instantaneous” are considered to have equal time scale. Any measurement with a time scale period of one minute or less is considered “instantaneous” and any supplied function is ignored;

The WRES also makes several assumptions when performing rescaling, as described below.

What assumptions and limitations apply when rescaling?

There are very many assumptions and limitations. In particular:

  • Downscaling is not supported, only (limited forms of) upscaling, as clarified in subsequent bullets;
  • The time step between measurements is constant; in other words, each ingested time-series is “regular” (unless “lenient” rescaling is requested);
  • Each upscaled period contains at least two measurements (which must be equally spaced, as above, unless “lenient” rescaling is requested);
  • The period associated with the evaluation time scale is an integer multiple of the period associated with the time scale of each source of data, although the observed multiple could differ from the predicted multiple, for example;
  • Measurements that overlap or interleave cannot be upscaled. Two measurements overlap if some portion of the period associated with one measurement overlaps the period associated with another measurement;
  • Limited function are supported. As of 5.17, these include: mean, minimum, maximum, total and unknown, which designates an unknown function;
  • The function associated with the evaluation time_scale must be explicitly defined, it cannot be unknown;
  • The function can be changed only in limited circumstances. Specifically, when changing the dimension of the measurement units (e.g., from volumetric flow units to volume units), upscaling can be used to solve this problem. In these circumstances, it is possible to upscale from instantaneous flows (with any given function) to an accumulated volume (i.e., with a function of total). It is also possible to upscale from flows that represent a mean over the scale period associated with a dataset to volumes that represent a total over the evaluation scale period. If the time scale function associated with a dataset is unknown, it is treated leniently (i.e., it is assumed to be a total when attempting to accumulate);

You can expect an error message when one or more of these rules is broken. Example error messages are provided in What does this error message actually mean?!

Can I relax any of the assumptions imposed on rescaling (e.g., to allow for missing data)?

Yes. You may declare the rescale_lenience as follows:

observed: some_file.csv
predicted: some_other_file.csv
unit: m3
time_scale:
  function: total
  period: 24
  unit: hours
rescale_lenience: all

This will allow rescaling to be performed for all sides of data (observed, predicted and baseline) when one or more values is encountered within the rescaling period, regardless of whether the values are regularly spaced. This option should be used with caution. For example, an evaluation time scale that spans a period of 30 days may contain a single instantaneous value at the start of this period. In such circumstances, the rescaled value is unlikely to be representative of the entire period. Conversely, if a period of 30 days contains values every fifteen minutes and a single value is missing, then lenient="true" may lead to a good approximation of the rescaled value.

The possible values of rescale_lenience correspond to the datasets for which lenience should be applied:

  • all;
  • none;
  • observed;
  • predicted;
  • baseline;
  • predicted and baseline;
  • observed and baseline; and
  • observed and predicted

What does this error message actually mean?!

Hopefully, the WRES error messages are adequately self explanatory. If they are considered confusing, please let us know, and we will try to improve them. Some further details on the causes of specific error messages are provided below. The type of error is always a RescalingException. The error messages below are illustrative and some of the fields will vary with the time series data that causes the error.

Error message (with example fields) Cause Mitigation
While attempting to upscale a collection of 12 events to a period ending at 2021-01-12T00:00:00Z, discovered fewer than two events in the collection, which is insufficient for upscaling. A time-series could not be upscaled because a period contained insufficient data for upscaling. An upscaled period must contain at least two, evenly spaced measurements. There are no straightforward mitigations as of 5.17. Remove this time-series or perform the upscaling outside of the WRES and provide WRES with the upscaled time-series.
While attempting to upscale a collection of 12 events to a period ending at 2021-01-12T00:00:00Z, discovered that the values were not evenly spaced within the period. Identified these intervals before stopping: [PT15M, PT30M]. A time-series could not be upscaled because a period contained measurements that were not evenly spaced. In this case, a spacing of 15 minutes and a spacing of 30 minutes were discovered. There are no straightforward mitigations as of 5.17. Remove this time-series or perform the upscaling outside of the WRES and provide WRES with the upscaled time-series.
While attempting to upscale to an evaluation time scale of ‘[PT6H, MEAN]’, encountered a time-series whose time-scale is undefined. This occurs when the data source fails to identify the existing time scale and the project declaration fails to clarify this information. Please include the time scale (time_scale) in the project declaration for each dataset that must be rescaled and does not clarify its own time scale, otherwise a change of scale is impossible. As described in the message. As described in the message.
Clone this wiki locally