Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

iantaylor-NOAA · 2023-05-11T00:14:41Z

@aliwhitman, here's another question for you. Sorry if this information is already spelled out somewhere and I missed it.

Could you clarify why there are lots of Oregon PacFIN BDS samples for Petrale with SAMPLE_TYPE == "S"?

@gertsevv and I noticed that there are years with no length data after processing through the PacFIN.Utilities::cleanPacFIN() function which I now see is due to application of the default filter which only retains for samples of type market (M) and exclude all samples of type research (R), special request (S), and commercial on-board (C) as documented here:

PacFIN.Utilities/R/cleanPacFIN.R

Lines 33 to 40 in 4683a3f

    
           #' @param keep_sample_type A vector of character values specifying the types of 
        
           #' samples you want to keep. The default is to keep `c("M")`. Available 
        
           #' types include market (M), research (R), special request (S), and 
        
           #' commercial on-board (C). There are additional samples without a `SAMPLE_TYPE`, 
        
           #' but they are only kept if you include `NA` in your call. 
        
           #' All sample types from California are assigned to `M`. 
        
           #' Including commercial on-board samples is not recommended because 
        
           #' they might also be in WCGOP data and would lead to double counting.

.

I get the idea that special request samples might be non-random or not representative of the population. However, all of these samples are associated with SAMPLE_METHOD == "R" (random) and they represent 44% of the petrale samples from Oregon, including 100% of the 37,348 samples from 1966-1986, another 4,468 samples from 1998-2007 (~30% of the total for that period), and another 43 samples scattered from other time periods. Two decades of sampling doesn't sound like a "special request" to me and it would be great to include these samples in the model, especially the ones from the early period, unless there's truly a good reason to exclude them.

Less than 4% of the Washington petrale samples and none of the California samples have SAMPLE_TYPE == "S".

The text was updated successfully, but these errors were encountered:

chantelwetzel-noaa · 2023-05-11T00:18:55Z

I am interested to know the current status of these samples from @aliwhitman. These samples were identified in the 2019 update assessment. My memory is always hazy but I believe the reason they were excluded \is because the samples did not have an associated sample weight preventing expansion of these data via our typical methods.

iantaylor-NOAA · 2023-05-11T15:27:55Z

Thanks for chiming in @chantelwetzel-noaa.

More information is below on the presence/absence sample weights and fish weights. Yes they are missing in many years for the samples with SAMPLE_TYPE == "S", but not definitely not all. I may not be selecting the right variables, however. Calculations below are from the raw PacFIN extraction before cleaning (available to the NWFSC folks in \nwcfile\FRAM\Assessments\Assessment Data\2023 Assessment Cycle\petrale sole\PacFIN.PTRL.bds.08.May.2023.RData).

Even if all the sample weights were missing, I think there would be value in considering unexpanded length comps for those years.

r$> samples <- bds.pacfin %>% 
  dplyr::filter(AGENCY_CODE == "O" & SAMPLE_TYPE == "S") %>% 
  dplyr::select(SAMPLE_YEAR, EXPANDED_SAMPLE_WEIGHT) 

r$> table(is.na(samples$EXPANDED_SAMPLE_WEIGHT), samples$SAMPLE_YEAR)

        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0    0    0    0    0    0 1882 1987 2438 2659
  TRUE  1744 2405 2635 2859 2977 1653 1522 1347 1120 1000  100    0    0    0

        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE 4200 2208  413  201  600 1398   28  505  491  313  319  279  393  310
  TRUE     0    0    0    0    0    0    0    0    0  102    0    0    0    0

        2005 2006 2007 2015 2016 2021
  FALSE  808  723  225    6    3    6
  TRUE     0    0    0    0    0    0

r$> table(is.na(samples$FISH_WEIGHT), samples$SAMPLE_YEAR)
       
        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0 1501 1422 1347 1120 1000    0  537    0    0
  TRUE  1744 2405 2635 2859 2977  152  100    0    0    0 1982 1450 2438 2659
       
        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE    0    0    0    0    0  600    0    0    0    0    0    0    0    0
  TRUE  4200 2208  413  201  600  798   28  505  491  415  319  279  393  310

        2005 2006 2007 2015 2016 2021
  FALSE    0    0    0    6    3    6
  TRUE   808  723  225    0    0    0

aliwhitman · 2023-05-11T16:47:25Z

The vast majority of these samples are pre-1987, which have ALL been (after the fact) designated as SP samples (across the board, all species) because of a lack of documentation on how these samples were taken and processed. And yes, some are lacking a sample weight (good memory Chantel! I had to go back to old emails to confirm that).

My recommendation would be for you to consider the use of the SP samples, particularly those prior to 1987 as this was just a blanket approach taken a number of years ago by our data shop. Using the sample method (Random), you can weed out the ones that were part of our standard protocol (even if it wasn't well documented) and ones that were truly "special request". I think you can also consider including an unexpanded length comp version, as Ian suggested, but again, I would still probably recommend removing those without an R sampling method.

iantaylor-NOAA · 2023-05-11T16:55:47Z

Thanks @aliwhitman, this is very helpful.
We will explore adding back the random samples from 1966-1986 and see what impact that has.

kellijohnson-NOAA · 2023-05-11T17:04:22Z

Thanks @chantelwetzel-noaa for your memory, @aliwhitman for the digging, and @iantaylor-NOAA for the summaries. I also want to note that some of these samples do not have entries in the FTID column for fish ticket ID. See note in the code here

PacFIN.Utilities/R/cleanPacFIN.R

Lines 232 to 236 in ad3c0c0

    
           #### Bad samples 
        
           # Remove bad OR samples 
        
           Pdata$SAMPLE_TYPE[Pdata$SAMPLE_NO %in% paste0("OR", badORnums)] <- "S" 
        
           # Via Chantel, from Ali at ODFW, do not keep b/c they don't have exp_wt or FTID

though I do not see where not having a FTID entry matters in the code downstream.

brianlangseth-NOAA · 2023-05-11T19:02:01Z

We have included special project samples prior to 1987 for canary - see this issue. I didn't check whether sample weight is there or not for the expansion even though we put them all through the expansion processing scripts.

kellijohnson-NOAA · 2023-05-14T00:38:06Z

@brianlangseth-NOAA did you really mean to close this issue? I think that maybe @iantaylor-NOAA should be the one to close it given that he opened it.

iantaylor-NOAA added status: question Questions about the issue need answered topic: database Related to information in or access to the PacFIN database, stuff outside of our control labels May 11, 2023

iantaylor-NOAA added this to the year_2023 milestone May 11, 2023

iantaylor-NOAA assigned aliwhitman May 11, 2023

kellijohnson-NOAA added type: bug priority: high The highest priority level in terms of what needs to be done. labels May 11, 2023

brianlangseth-NOAA closed this as completed May 11, 2023

brianlangseth-NOAA reopened this May 14, 2023

kellijohnson-NOAA modified the milestones: year_2023, year_2025 May 10, 2024

chantelwetzel-noaa mentioned this issue Nov 13, 2024

Explore and process PacFIN bds pfmc-assessments/sablefish_2025#20

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

iantaylor-NOAA commented May 11, 2023

chantelwetzel-noaa commented May 11, 2023

iantaylor-NOAA commented May 11, 2023

aliwhitman commented May 11, 2023

iantaylor-NOAA commented May 11, 2023

kellijohnson-NOAA commented May 11, 2023

brianlangseth-NOAA commented May 11, 2023 •

edited

Loading

kellijohnson-NOAA commented May 14, 2023

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

Comments

iantaylor-NOAA commented May 11, 2023

chantelwetzel-noaa commented May 11, 2023

iantaylor-NOAA commented May 11, 2023

aliwhitman commented May 11, 2023

iantaylor-NOAA commented May 11, 2023

kellijohnson-NOAA commented May 11, 2023

brianlangseth-NOAA commented May 11, 2023 • edited Loading

kellijohnson-NOAA commented May 14, 2023

brianlangseth-NOAA commented May 11, 2023 •

edited

Loading