Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

method_context_id not matching after binding trait values and separating once again #98

Open
fontikar opened this issue Sep 5, 2024 · 5 comments

Comments

@fontikar
Copy link
Collaborator

fontikar commented Sep 5, 2024

Describe the bug
Found in test-trait_bind_sep.R on switch-deprecate branch

To Reproduce
Steps to reproduce the behavior:

#Extract a dataset
dataset_id <- c("Falster_2005_2")
subset <- extract_dataset(austraits_5.0.0_lite, dataset_id = dataset_id)
bounded <- bind_trait_values(subset$traits)
seperated <-separate_trait_values(data = bounded, austraits_5.0.0_lite$definitions)
subset$traits %>% 
  select(-value_type) %>% 
  arrange(dataset_id, observation_id, trait_name, value) |> 
  select(method_context_id) 
  
  seperated %>% 
  select(-value_type) %>% 
  arrange(dataset_id, observation_id, trait_name, value) |>
  select(method_context_id)

Expected behavior
All rows expect for value_type should be the same

@ehwenk
Copy link
Collaborator

ehwenk commented Sep 5, 2024

@fontikar The problem is that each of the 4 rows of data that are being bound together by the function bind_trait_values has a different method_context_id (as in, each was collected using a somewhat different method). The function is simply taking the first method_context_id. There are three other grouping variables that should be included in the function, method_id, method_context_id, and repeat_measurements_id.

I'm also puzzled why dataset_id isn't a grouping variable. Because if the function is used with a subset of data that includes multiple datasets there will be duplicate observation_id values that belong to the two datasets.

I suspect this function was written for AusTraits 3.0 and has never been revamped! (I've never used it.)

@fontikar
Copy link
Collaborator Author

fontikar commented Sep 5, 2024

Oh my goodness that is such a helpful comment @ehwenk I have been staring this for so long and can't figure out where the problem is - definitely lacking on the expertise on what columns to use for splitting.

I don't understand why the tests used to past and now they don't....

Are you brave enough to tackle this since you have more domain knowledge on the dataset structure? I was about to bug @dfalster!

@ehwenk
Copy link
Collaborator

ehwenk commented Sep 5, 2024

I think I can fix it very quickly... so yes, let me try.

@fontikar
Copy link
Collaborator Author

fontikar commented Sep 5, 2024

Branch from switch-deprecate and then I can merge your solution back in

@fontikar
Copy link
Collaborator Author

fontikar commented Sep 5, 2024

Just pushed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants