A simple R package to derive flag for aggregates
Now 'flagr' is on CRAN so it can be installed by
> install.packages("flagr")
or use the development version from GitHub
> devtools::install_github("eurostat/flagr")
A flag is an attribute of a cell in a data set that provides additional qualitative information about the statistical value of that cell. They can indicate, for example, that a given value is estimated, confidential or represents a break in the time series.
Currently different sets of flags are in use in the European Statistical System (ESS). Some domains uses the SDMX code list for observation status and confidentiality status. Eurostat uses a simplified list of flags for dissemination, and other domains applies different sets of flags defined in regulations or in other agreements.
In most cases it is well defined how the flag shall be assigned to the individual values, but it is not straightforward what flag shall be propagated to an aggregated value like sum, average, quintiles, etc. For this reason this package (flagr) was created to help users assign a flag to the aggregate based on the underlying flags and values.
The package contains a fictive test data set(test_data
), a wrapping function (propagate_flag
) calling the different methods and 3 methods (flag_hierarchy
, flag_frequency
and flag_weighted
) to derive flags for aggregates.
- the
flag_hierarchy
method returns the flag which listed first in a given set of ordered flags, - the
flag_frequency
method returns the most frequent flag for the aggregate, - the
flag_weighted
method returns the flag which cumulative weight is the highest.
Detailed documentation of the functions is in the package or see the vignette for more information.
> library(tidyr)
> flags <- spread(test_data[, c(1:3)], key = time, value = flags)
>
> #hierarchy method
> propagate_flag(flags[, c(2:ncol(flags))],"hierarchy","puebscd")
> propagate_flag(flags[, c(2:ncol(flags))],"hierarchy",c("b","c","d","e","p","s","u"))
>
> #frequency method
> propagate_flag(flags[, c(2:ncol(flags))],"frequency")
>
> #weighted method
> flags<-flags[, c(2:ncol(flags))]
> weights <- spread(test_data[, c(1, 3:4)], key = time, value = values)
> weights<-weights[, c(2:ncol(weights))]
>
> propagate_flag(flags,"weighted",flag_weights=weights,threshold=0.1)