-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112
Comments
I am interested to know the current status of these samples from @aliwhitman. These samples were identified in the 2019 update assessment. My memory is always hazy but I believe the reason they were excluded \is because the samples did not have an associated sample weight preventing expansion of these data via our typical methods. |
Thanks for chiming in @chantelwetzel-noaa. More information is below on the presence/absence sample weights and fish weights. Yes they are missing in many years for the samples with SAMPLE_TYPE == "S", but not definitely not all. I may not be selecting the right variables, however. Calculations below are from the raw PacFIN extraction before cleaning (available to the NWFSC folks in \nwcfile\FRAM\Assessments\Assessment Data\2023 Assessment Cycle\petrale sole\PacFIN.PTRL.bds.08.May.2023.RData). Even if all the sample weights were missing, I think there would be value in considering unexpanded length comps for those years.
|
The vast majority of these samples are pre-1987, which have ALL been (after the fact) designated as SP samples (across the board, all species) because of a lack of documentation on how these samples were taken and processed. And yes, some are lacking a sample weight (good memory Chantel! I had to go back to old emails to confirm that). My recommendation would be for you to consider the use of the SP samples, particularly those prior to 1987 as this was just a blanket approach taken a number of years ago by our data shop. Using the sample method (Random), you can weed out the ones that were part of our standard protocol (even if it wasn't well documented) and ones that were truly "special request". I think you can also consider including an unexpanded length comp version, as Ian suggested, but again, I would still probably recommend removing those without an R sampling method. |
Thanks @aliwhitman, this is very helpful. |
Thanks @chantelwetzel-noaa for your memory, @aliwhitman for the digging, and @iantaylor-NOAA for the summaries. I also want to note that some of these samples do not have entries in the FTID column for fish ticket ID. See note in the code here PacFIN.Utilities/R/cleanPacFIN.R Lines 232 to 236 in ad3c0c0
though I do not see where not having a FTID entry matters in the code downstream. |
We have included special project samples prior to 1987 for canary - see this issue. I didn't check whether sample weight is there or not for the expansion even though we put them all through the expansion processing scripts. |
@brianlangseth-NOAA did you really mean to close this issue? I think that maybe @iantaylor-NOAA should be the one to close it given that he opened it. |
@aliwhitman, here's another question for you. Sorry if this information is already spelled out somewhere and I missed it.
Could you clarify why there are lots of Oregon PacFIN BDS samples for Petrale with SAMPLE_TYPE == "S"?
@gertsevv and I noticed that there are years with no length data after processing through the
PacFIN.Utilities::cleanPacFIN()
function which I now see is due to application of the default filter which only retains for samples of type market (M) and exclude all samples of type research (R), special request (S), and commercial on-board (C) as documented here:PacFIN.Utilities/R/cleanPacFIN.R
Lines 33 to 40 in 4683a3f
I get the idea that special request samples might be non-random or not representative of the population. However, all of these samples are associated with SAMPLE_METHOD == "R" (random) and they represent 44% of the petrale samples from Oregon, including 100% of the 37,348 samples from 1966-1986, another 4,468 samples from 1998-2007 (~30% of the total for that period), and another 43 samples scattered from other time periods. Two decades of sampling doesn't sound like a "special request" to me and it would be great to include these samples in the model, especially the ones from the early period, unless there's truly a good reason to exclude them.
Less than 4% of the Washington petrale samples and none of the California samples have SAMPLE_TYPE == "S".
The text was updated successfully, but these errors were encountered: