Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The upper and lower bounds are not (always) properly computed for alpha = 0 #36

Open
stephaneghozzi opened this issue Aug 15, 2020 · 4 comments

Comments

@stephaneghozzi
Copy link

stephaneghozzi commented Aug 15, 2020

For alpha = 0, the upper and lower bounds should always be Inf and 0, respectively.
However:

  • For glm: trendbreaker uses ciTools::add_pi internally to find upper and lower bounds.
    • It sometimes finds finite bounds. In my opinion this is a bug of ciTools which trendbreaker should correct.
    • This leads to alarms being raised (outliers detected) where none should be.
    • Also, it is unclear why the bounds are not exactly the same from asmodee and ciTools::add_pi when glm is used in the former and both are applied in the same.
  • For lm: a lower bound of -Inf is sometimes found, which per se is fine for the assumed normal distribution, but doesn't make sense for counts. trendbreaker should either renounce using lm (see asmodee() shouldn't incorporate lm() #34) or set a minimum lower bound of 0.

The correct behavior is shown below with MASS::glm.nb, where trendbreaker sets bounds explicitly based on the negative binomial.

See https://github.com/reconhub/trendbreaker/R/model-interface.R.

# For `lm`: a lower bound of -Inf is sometimes found, which is per se fine for the assumed normal
#   distribution, but doesn't make sense for counts. ASMODEE should either renounce using `lm`
#   or set a minimum lower bound of 0.
#
# The correct behavior is shown with `MASS::glm.nb`, where ASMODEE set bounds explicitly based on
# the negative binomial.
#
# Also, `asmodee` is not deterministic.
#
# See trendbreaker/R/model-interface.R

library(tidyr)
library(dplyr)
library(trendbreaker)
library(ggplot2)

save_plots_png <- T

models <- list(
  poisson_constant = glm_model(count ~ 1, family='poisson'),
  regression = lm_model(count ~ date),
  negbin_time = glm_nb_model(count ~ date)
)

# seed = 1 => ASMODEE selects `lm` for Poisson noise
# seed = 10 => ASMODEE selects `glm` for Poisson noise and `glm.nb` for Poisson noise with peak
for (seed in c(1, 10)) {
  set.seed(seed)
  ts_list <- list(
    random = tibble(date=1:42, count=rpois(42, 100)),
    peak = tibble(date=1:42, count=rpois(42, 100)+c(rep(0,20), 200, rep(0,21)))
  )
  for (ts_type in names(ts_list)) {
    ts <- ts_list[[ts_type]]
    asmodee_res <- asmodee(
      ts,
      models = models,
      alpha = 0,
      max_k = 12,
      method = evaluate_aic
    )
    print(paste0(
      'ASMODEE with alpha = 0 on constant Poisson noise ',
      ifelse(ts_type=='peak','+ peak ',''),
      'with seed ', seed, ':'
    ))
    print(asmodee_res$model$model$call)
    print('upper:')
    print(asmodee_res$results$upper)
    print('lower:')
    print(asmodee_res$results$lower)
    asmodee_plot <- plot(asmodee_res,'date') +
      ggtitle(paste0('ASMODEE with alpha = 0 on constant Poisson noise ',
        ifelse(ts_type=='peak','+ peak ',''),
        'with seed ', seed))
    print(asmodee_plot)
    if (save_plots_png) {
      ggsave(plot=asmodee_plot, filename=paste0('asmodee_res_',seed,'_',ts_type,'.png'),
        width=20, height=10, units='cm', dpi=150)
    }

    cat('\n')

    if (as.character(asmodee_res$model$model$call)[1]=='glm') {
      ci_pi <- ciTools::add_pi(
        tb = ts,
        fit = glm(formula = count ~ 1, family = "poisson", data = ts),
        alpha = 0,
        names = c("lower", "upper")
      )
      print('ciTools::add_pi for Poisson GLM:')
      print('glm(formula = count ~ 1, family = \'poisson\', data = ts)')
      print('upper:')
      print(ci_pi$upper)
      print('lower:')
      print(ci_pi$lower)
      warn <- warnings()
      if (length(warn[[length(warn)]])>0) {
        if (warn[[length(warn)]][[1]]=='add_pi.glm') {
          print('in ciTools::add_pi.glm:')
          print(names(warn)[length(warn)])
        }
      }
      cat('\n')
    }
  }
}

# Applied to a time series from a "flare-up" scenario

ts <- tibble(
  date=1:42,
  count=c(2, 2, 2, 2, 1, 1, 2, 2, 2, 0, 1, 0, 0, 0, 1, 0, 1, 1, 2, 1, 1, 0, 1, 2, 1, 2, 2, 2, 0, 0,
    2, 3, 2, 1, 0, 1, 1, 0, 2, 3, 0, 7)
)

i_finite <- c()
i_outlier <- c()
model_finite <- c()
plot_outlier <- T
n_trials <- 100
for (j in 1:n_trials) {
  asmodee_res <- asmodee(
    ts,
    models = models,
    alpha = 0,
    max_k = 12,
    method = evaluate_aic
  )
  ubs <- asmodee_res$results$upper
  lbs <- asmodee_res$results$lower
  if (any(is.finite(ubs)) | any(lbs!=0)) {
    i_finite <- c(i_finite, j)
    model_finite <- c(model_finite, as.character(asmodee_res$model$model$call)[1])
  }
  if (any(asmodee_res$results$outlier)) {
    i_outlier <- c(i_outlier, j)
    if (plot_outlier) {
      asmodee_plot <- plot(asmodee_res,'date') +
        ggtitle(paste0('ASMODEE with alpha = 0 on "flare-up" time series, trial ',j))
      print(asmodee_plot)
      if (save_plots_png) {
        ggsave(plot=asmodee_plot, filename=paste0('asmodee_res_flareup_',j,'.png'),
          width=20, height=10, units='cm', dpi=150)
      }
      plot_outlier <- F
    }
  }
}
print('ASMODEE with alpha = 0 on a \'flare-up\' time series:')
print('Proportion of trials with finite bounds:')
print(paste0(round(100*length(i_finite)/n_trials),'%'))
print('Proportion of trials with outliers:')
print(paste0(round(100*length(i_outlier)/n_trials),'%'))
print('Models leading to finite bounds:')
print(unique(model_finite))

Output:

[1] "ASMODEE with alpha = 0 on constant Poisson noise with seed 1:"
lm(formula = count ~ date, data = data)
[1] "upper:"
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
 24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
[1] "lower:"
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
-Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf 
  20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38 
-Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf 
  39   40   41   42 
-Inf -Inf -Inf -Inf 

[1] "ASMODEE with alpha = 0 on constant Poisson noise + peak with seed 1:"
MASS::glm.nb(formula = count ~ date, data = data, init.theta = 25.08322303, 
    link = log)
[1] "upper:"
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
 24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
[1] "lower:"
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
32 33 34 35 36 37 38 39 40 41 42 
 0  0  0  0  0  0  0  0  0  0  0 

[1] "ASMODEE with alpha = 0 on constant Poisson noise with seed 10:"
glm(formula = count ~ 1, family = "poisson", data = data)
[1] "upper:"
 [1] 133 135 130 142 134 134 136 136 136 135 134 139 135 138 135 135 131 134 134 147 131 132
[23] 131 136 139 132 143 135 136 136 142 135 130 134 135 132 139 136 135 137 138 137
[1] "lower:"
 [1] 66 70 61 69 69 68 68 63 65 68 70 65 63 68 62 68 68 67 65 64 63 68 66 65 71 66 69 59 69 67
[31] 69 58 66 67 65 66 66 67 65 65 72 67

[1] "ciTools::add_pi for Poisson GLM:"
[1] "glm(formula = count ~ 1, family = 'poisson', data = ts)"
[1] "upper:"
 [1] 129 129 139 133 134 134 140 134 137 138 131 133 134 131 137 135 136 129 135 141 137 135
[23] 134 131 136 138 142 136 139 134 137 140 135 136 138 139 130 133 138 131 136 131
[1] "lower:"
 [1] 66 64 67 65 68 65 71 56 65 66 68 71 65 67 68 67 71 65 62 64 64 67 66 58 69 60 69 66 65 67
[31] 64 61 66 66 62 68 64 68 70 67 65 67

[1] "ASMODEE with alpha = 0 on constant Poisson noise + peak with seed 10:"
MASS::glm.nb(formula = count ~ date, data = data, init.theta = 22.41823699, 
    link = log)
[1] "upper:"
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
 24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
[1] "lower:"
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
32 33 34 35 36 37 38 39 40 41 42 
 0  0  0  0  0  0  0  0  0  0  0 

[1] "ASMODEE with alpha = 0 on a 'flare-up' time series:"
[1] "Proportion of trials with finite bounds:"
[1] "100%"
[1] "Proportion of trials with outliers:"
[1] "26%"
[1] "Models leading to finite bounds:"
[1] "glm"

(The last percentages will vary from run to run, see #35.)

asmodee_res_1_random
asmodee_res_1_peak
asmodee_res_10_random
asmodee_res_10_peak
asmodee_res_flareup_1

@dirkschumacher
Copy link
Member

Why would one choose an alpha = 0? Maybe we could solve it by requiring alpha > 0?

@stephaneghozzi
Copy link
Author

Why would one choose an alpha = 0? Maybe we could solve it by requiring alpha > 0?

alpha=0 is an admissible and well defined value, with a clear expectation as to what this should lead to.
0 is not an interesting value for practical use. However to me that points to two problems:

  • a bug in ciTools::add_pi so that I don't know whether I can trust the values it computes for alpha>0
  • an inconsistency in the way alpha is handled between Poisson and NegBin.

@dirkschumacher
Copy link
Member

dirkschumacher commented Aug 19, 2020

Maybe we need to move away from ciTools and roll out our own for the different model types? We included ciTools to have a default method for many model types just to have something.

@TimTaylor
Copy link
Member

Currently looking at this in the noci branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants