Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Error in calculating index when using AR1 with extra_time #28

Open
1 task done
chantelwetzel-noaa opened this issue Mar 14, 2024 · 8 comments
Open
1 task done
Labels
status--Pretriage Triage needs to happen type--Bug Non-standard behavior is present in the code

Comments

@chantelwetzel-noaa
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When exploring using AR1 sdmTMB printed the suggestion of adding the missing data year (extra_time = c(2020)) but when this added to the function call:

best <- data %>%
dplyr::mutate(
# Evaluate the call in family
family = purrr::map(family, .f = ~ eval(parse(text = .x))),
# Run the model on each row in data
results = purrr::pmap(
.l = list(
data = data_filtered,
formula = formula,
family = family,
anisotropy = anisotropy,
n_knots = knots,
spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list),
extra_time = c(2020)
),
.f = indexwc::run_sdmtmb
)
)

Adding extra_time to the function does run a model using AR1. but the code errors out when calculating the index due to the year 2020 not being included in the grid (I think):

Error in dplyr::mutate():
? In argument: results = purrr::pmap(...).
Caused by error in purrr::pmap():
? In index: 1.
Caused by error in purrr::map() at indexwc/R/calc_index_areas.R:71:3:
? In index: 1.
Caused by error in get_generic():
! area should be of the same length as nrow(newdata) or of length 1.
Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/dplyr:::mutate_error>
Error in dplyr::mutate():
? In argument: results = purrr::pmap(...).
Caused by error in purrr::pmap():
? In index: 1.
Caused by error in purrr::map() at indexwc/R/calc_index_areas.R:71:3:
? In index: 1.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- Node:
- npm:

Anything else?

No response

@chantelwetzel-noaa chantelwetzel-noaa added type--Bug Non-standard behavior is present in the code status--Pretriage Triage needs to happen labels Mar 14, 2024
@ericward-noaa
Copy link
Collaborator

Can you help me understand what the formula, etc are here?

Starting with the example in the manual, this works fine with no extra time,

pcod_spde <- make_mesh(pcod, c("X", "Y"), n_knots = 60, type = "kmeans")
m <- sdmTMB(
  data = pcod,
  formula = density ~ 0 + as.factor(year),
  time = "year", 
  mesh = pcod_spde, 
  family = tweedie(link = "log")
)
# make prediction grid:
nd <- replicate_df(qcs_grid, "year", unique(pcod$year))
# Note `return_tmb_object = TRUE` and the prediction grid:
predictions <- predict(m, newdata = nd, return_tmb_object = TRUE)
ind <- get_index(predictions)

If you add extra_time when year is a factor, it will error out. For example this won't work, because you can't estimate the year effect for 2020
update(m, extra_time = c(2020))

You can fix this by replacing fixed effects in year with random effects, like a time-varying random process. This works but obviously isn't a good model because of lots of other missing years (2020 is treated as the last year in the sequence after 2015, 2017, ...)

update(m, extra_time = c(2020),
       formula = density ~ 0,
       time_varying = ~1)

With spatiotemporal fields, you want them to probably be rw or ar1 if you have missing years -- 2020 here.

Like in your comment, you'll also need to add 2020 to the grid if you haven't already

@seananderson
Copy link
Collaborator

seananderson commented Mar 14, 2024

It should be OK if the grid is missing years. In fact, you could predict on one year at a time to save memory if needed. Is it possible this is the issue? Otherwise, can you post what the call to sdmTMB ends up being?

Also, if you're using extra time, make sure you're on the latest version (CRAN or GitHub), because there was an important bug in how that was handled for a while. Oh, and I just sped up index calculation a fair bit on the GitHub version, so I'd work from that version until the next CRAN release.

@chantelwetzel-noaa
Copy link
Collaborator Author

Mental note to myself - both Eric and Sean are lurking here and are great at chiming in on issues.

This issue arose after a discussion between @kellijohnson-NOAA and myself yesterday. We are running models using a configuration file for each species. There are a couple of species (yelloweye rockfish) where I have been unable to get positive definite hessian with a delta model due to limited data where I have turned off the spatiotemporal model component for each of the two models. We thought it would be useful to try including an AR1 process in the model for these species. I originally encountered this issue in a model I ran overnight since including the AR1 process slows down the model run. In my attempts to work through the issue. I have lost the output from the model fitting (bad move on my part). I am rerunning a model with AR1 for the presence/absence model component without passing the extra_time component to the model to see if this resolves the issue. Once that finishes running I can update this issue.

Here is the larger script that Kelli and I have been working from when running models:

species_data <- nwfscSurvey::pull_catch(common_name = "yelloweye rockfish", survey = "NWFSC.Combo")
configuration <- tibble::as_tibble(read.csv(
  file.path("data-raw", "configuration.csv")
))

configuration <- configuration[which(configuration$species == "yelloweye rockfish"),  ]

data <-configuration %>%
  # Row by row ... do stuff then ungroup
  dplyr::rowwise() %>%
  # Pull the data based on the function found in fxn column
  dplyr::mutate(
    data_raw = list(format_data(data = species_data)),
    data_filtered = list(data_raw %>%
                           dplyr::filter(
                             depth <= min_depth, depth >= max_depth,
                             latitude >= min_latitude, latitude <= max_latitude,
                             year >= min_year, year <= max_year
                           ))
  ) %>%
  dplyr::ungroup()

best <- data %>%
  dplyr::mutate(
    # Evaluate the call in family
    family = purrr::map(family, .f = ~ eval(parse(text = .x))),
    # Run the model on each row in data
    results = purrr::pmap(
      .l = list(
        data = data_filtered,
        formula = formula,
        family = family,
        anisotropy = anisotropy,
        n_knots = knots,
        spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list),
        extra_time = c(2020)
      ),
      .f = indexwc::run_sdmtmb
    )
  )

@ericward-noaa
Copy link
Collaborator

I'm confused still about what the AR1 process is being included in -- are you putting it in spatiotemporal effects?

I grabbed the code, and the configuration for this example is using a formula that is

catch_weight ~ 0 + fyear + pass_scaled

You also have 'extra_time' = 2020 -- so I think this is the same as the model in my comment that doesn't work.

The spatial only model with AR1 year effects works fine and passes checks,

mesh <- make_mesh(data$data_filtered[[1]], xy_cols = c("x","y"), n_knots = 200)
fit_s <- 
  sdmTMB(data = data$data_filtered[[1]],
         formula = catch_weight ~ 0 + pass_scaled,
         time_varying = ~ 1,
         mesh = mesh,
         family = delta_gamma(),
         anisotropy = TRUE,
         time = "year",
         spatiotemporal = "off",
         extra_time = c(2020))

@seananderson
Copy link
Collaborator

Just for clarity, that model above has a random walk on the intercept (here year effects). It needs time_varying_type = 'ar1' to switch from the default to AR1.

I assume here you're taking about spatiotemporal AR1 fields? Indeed, that results in a less sparse matrix of random effects (the random effects in a given year become dependent on the previous and next year) and therefore is slower to fit.

@ericward-noaa
Copy link
Collaborator

Ah yeah, thanks for pointing that out @seananderson

@chantelwetzel-noaa I was able to get the spatiotemporal version of the above model to converge with an extra newton step,

fit_st <- update(fit_s, spatiotemporal="ar1")
fit_st <- run_extra_optimization(fit_st, newton_loops = 1)

So as long as you turn off fixed year effects, I think either the spatial only or spatiotemporal models would work

@kellijohnson-NOAA
Copy link
Collaborator

Thanks @ericward-noaa and @seananderson. I am wondering if we turn off fixed effects for years because what I had told Chantel to do was use ar1 for the spatiotemporal component, then can you still use that output for an index of abundance. I remember somewhere there was guidance that fixed effects are import for years if using the model results for an index that will be used for assessment purposes. Feel free to just point me to something to read if you don't want to explain it again here. And, sorry @chantelwetzel-noaa that I got you into this mess.

@ericward-noaa
Copy link
Collaborator

ericward-noaa commented Mar 14, 2024

In general, yes -- with no missing data, fixed year effects will give the best unbiased estimates of a trend. If you don't care about estimating the index in 2020, then fixed effects is fine (and turn extra_time off). But if you do want to estimate the mean density in 2020, you need to have some process to fill in the gap. This could be using a time_varying formula like my examples, a smooth over year, e.g. s(year), or just estimating a global intercept with no year effects. This is all new territory for WCBTS -- so no worries

Related / unrelated, @seananderson has been tinkering with using time varying year and spatiotemporal effects for the DFO Synoptic surveys. They had a survey in 2020, but the reason there is the spatial patterning of the survey (it's a checkerboard with the same 2 of 4 cells sampled every other year)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status--Pretriage Triage needs to happen type--Bug Non-standard behavior is present in the code
Projects
None yet
Development

No branches or pull requests

4 participants