Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

Open
iantaylor-NOAA opened this issue May 11, 2023 · 7 comments
Open

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

iantaylor-NOAA opened this issue May 11, 2023 · 7 comments
Assignees
Labels
priority: high The highest priority level in terms of what needs to be done. status: question Questions about the issue need answered topic: database Related to information in or access to the PacFIN database, stuff outside of our control type: bug
Milestone

Comments

@iantaylor-NOAA
Copy link
Contributor

@aliwhitman, here's another question for you. Sorry if this information is already spelled out somewhere and I missed it.

Could you clarify why there are lots of Oregon PacFIN BDS samples for Petrale with SAMPLE_TYPE == "S"?

@gertsevv and I noticed that there are years with no length data after processing through the PacFIN.Utilities::cleanPacFIN() function which I now see is due to application of the default filter which only retains for samples of type market (M) and exclude all samples of type research (R), special request (S), and commercial on-board (C) as documented here:

#' @param keep_sample_type A vector of character values specifying the types of
#' samples you want to keep. The default is to keep `c("M")`. Available
#' types include market (M), research (R), special request (S), and
#' commercial on-board (C). There are additional samples without a `SAMPLE_TYPE`,
#' but they are only kept if you include `NA` in your call.
#' All sample types from California are assigned to `M`.
#' Including commercial on-board samples is not recommended because
#' they might also be in WCGOP data and would lead to double counting.
.

I get the idea that special request samples might be non-random or not representative of the population. However, all of these samples are associated with SAMPLE_METHOD == "R" (random) and they represent 44% of the petrale samples from Oregon, including 100% of the 37,348 samples from 1966-1986, another 4,468 samples from 1998-2007 (~30% of the total for that period), and another 43 samples scattered from other time periods. Two decades of sampling doesn't sound like a "special request" to me and it would be great to include these samples in the model, especially the ones from the early period, unless there's truly a good reason to exclude them.

Less than 4% of the Washington petrale samples and none of the California samples have SAMPLE_TYPE == "S".

@iantaylor-NOAA iantaylor-NOAA added status: question Questions about the issue need answered topic: database Related to information in or access to the PacFIN database, stuff outside of our control labels May 11, 2023
@iantaylor-NOAA iantaylor-NOAA added this to the year_2023 milestone May 11, 2023
@chantelwetzel-noaa
Copy link
Contributor

I am interested to know the current status of these samples from @aliwhitman. These samples were identified in the 2019 update assessment. My memory is always hazy but I believe the reason they were excluded \is because the samples did not have an associated sample weight preventing expansion of these data via our typical methods.

@kellijohnson-NOAA kellijohnson-NOAA added type: bug priority: high The highest priority level in terms of what needs to be done. labels May 11, 2023
@iantaylor-NOAA
Copy link
Contributor Author

Thanks for chiming in @chantelwetzel-noaa.

More information is below on the presence/absence sample weights and fish weights. Yes they are missing in many years for the samples with SAMPLE_TYPE == "S", but not definitely not all. I may not be selecting the right variables, however. Calculations below are from the raw PacFIN extraction before cleaning (available to the NWFSC folks in \nwcfile\FRAM\Assessments\Assessment Data\2023 Assessment Cycle\petrale sole\PacFIN.PTRL.bds.08.May.2023.RData).

Even if all the sample weights were missing, I think there would be value in considering unexpanded length comps for those years.

r$> samples <- bds.pacfin %>% 
  dplyr::filter(AGENCY_CODE == "O" & SAMPLE_TYPE == "S") %>% 
  dplyr::select(SAMPLE_YEAR, EXPANDED_SAMPLE_WEIGHT) 

r$> table(is.na(samples$EXPANDED_SAMPLE_WEIGHT), samples$SAMPLE_YEAR)

        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0    0    0    0    0    0 1882 1987 2438 2659
  TRUE  1744 2405 2635 2859 2977 1653 1522 1347 1120 1000  100    0    0    0

        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE 4200 2208  413  201  600 1398   28  505  491  313  319  279  393  310
  TRUE     0    0    0    0    0    0    0    0    0  102    0    0    0    0

        2005 2006 2007 2015 2016 2021
  FALSE  808  723  225    6    3    6
  TRUE     0    0    0    0    0    0

r$> table(is.na(samples$FISH_WEIGHT), samples$SAMPLE_YEAR)
       
        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0 1501 1422 1347 1120 1000    0  537    0    0
  TRUE  1744 2405 2635 2859 2977  152  100    0    0    0 1982 1450 2438 2659
       
        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE    0    0    0    0    0  600    0    0    0    0    0    0    0    0
  TRUE  4200 2208  413  201  600  798   28  505  491  415  319  279  393  310

        2005 2006 2007 2015 2016 2021
  FALSE    0    0    0    6    3    6
  TRUE   808  723  225    0    0    0

@aliwhitman
Copy link
Collaborator

The vast majority of these samples are pre-1987, which have ALL been (after the fact) designated as SP samples (across the board, all species) because of a lack of documentation on how these samples were taken and processed. And yes, some are lacking a sample weight (good memory Chantel! I had to go back to old emails to confirm that).

My recommendation would be for you to consider the use of the SP samples, particularly those prior to 1987 as this was just a blanket approach taken a number of years ago by our data shop. Using the sample method (Random), you can weed out the ones that were part of our standard protocol (even if it wasn't well documented) and ones that were truly "special request". I think you can also consider including an unexpanded length comp version, as Ian suggested, but again, I would still probably recommend removing those without an R sampling method.

@iantaylor-NOAA
Copy link
Contributor Author

Thanks @aliwhitman, this is very helpful.
We will explore adding back the random samples from 1966-1986 and see what impact that has.

@kellijohnson-NOAA
Copy link
Contributor

Thanks @chantelwetzel-noaa for your memory, @aliwhitman for the digging, and @iantaylor-NOAA for the summaries. I also want to note that some of these samples do not have entries in the FTID column for fish ticket ID. See note in the code here

#### Bad samples
# Remove bad OR samples
Pdata$SAMPLE_TYPE[Pdata$SAMPLE_NO %in% paste0("OR", badORnums)] <- "S"
# Via Chantel, from Ali at ODFW, do not keep b/c they don't have exp_wt or FTID

though I do not see where not having a FTID entry matters in the code downstream.

@brianlangseth-NOAA
Copy link
Contributor

brianlangseth-NOAA commented May 11, 2023

We have included special project samples prior to 1987 for canary - see this issue. I didn't check whether sample weight is there or not for the expansion even though we put them all through the expansion processing scripts.

@kellijohnson-NOAA
Copy link
Contributor

@brianlangseth-NOAA did you really mean to close this issue? I think that maybe @iantaylor-NOAA should be the one to close it given that he opened it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high The highest priority level in terms of what needs to be done. status: question Questions about the issue need answered topic: database Related to information in or access to the PacFIN database, stuff outside of our control type: bug
Projects
None yet
Development

No branches or pull requests

5 participants