Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Fix datasets API custom keys encoding #17793

Conversation

davelopez
Copy link
Contributor

@davelopez davelopez commented Mar 19, 2024

Requires #17779
Fixes #17729 since the custom keys will be serialized correctly after this.

This is similar to #17779 but this time addressing history contents.

TODO

  • Add potential missing fields to models
  • Increase test coverage for the use of view and keys query parameters for HDAs and HDCAs.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the right thing to do!

@davelopez davelopez force-pushed the 24.0_fix_datasets_api_custom_keys_encoding branch from 8f8865f to 68a030c Compare March 20, 2024 09:30
@davelopez
Copy link
Contributor Author

I hit a bit of a roadblock with those "metadata_*" dynamic fields... I've been trying to find a clever solution but to no avail, so I will have to resort back to "extra=allow" for custom HDAs... 😞
The good news is that those fields will likely not use any encoded/decoded value so we are safe from this issue.

This time for those endpoints returning HistoryContentsResult
Since the visualizations are not strictly part of the detailed view definition.
As those are dynamic and associated with particular datatypes and not common to all HDAs.
Unfortunately there is no easy way to deal with the dynamic `metadata_*` fields so we need to resort back to this configuration until we can find a better solution for those.
@mvdbeek
Copy link
Member

mvdbeek commented Mar 21, 2024

resort back to "extra=allow" for custom HDAs.

using partial_model is still a big improvement ... in hindsight it'd have been clever to at least place the arbitrary metadata attributes under a common metadata object. I suppose it would result in a ridiculously large schema if we used create_model to create one model per datatype (which defines the custom metadata_* keys) ? 😆

@davelopez
Copy link
Contributor Author

in hindsight it'd have been clever to at least place the arbitrary metadata attributes under a common metadata object.

That is exactly what I thought... is it too late to change it? I know it will be a breaking change... but having a simple:

metadata: Optional[Dict[str, Any]]

would certainly make everything cleaner and "easier to discover and handle" for clients.

@mvdbeek
Copy link
Member

mvdbeek commented Mar 21, 2024

We could think about it for dev, but I worry that there might be scripts out there that expect stuff like metadata_bam_index in the response, meaning we'd have to add another view if we want to do this.

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add a test maybe making sure that the dataset id is not an integer ?

@davelopez
Copy link
Contributor Author

Yes, I'm still working on the tests, I wanted to figure out the HDCA's views and modify the models and tests accordingly.

@davelopez
Copy link
Contributor Author

davelopez commented Mar 21, 2024

OK, I found out that collection serialization does not follow the same patterns as datasets, for example, the "serialization_params" are ignored, and only the view is considered. It seems to be handled here:

def dictify_dataset_collection_instance(

So either, there is no point in supporting partial models for those, or this is an ancient bug.

Update

The serialization behavior also depends on the endpoint we are hitting... there are differences between the "show" and the "index" endpoints when listing the contents of a history it will handle the serialization differently depending on the dev parameter...

Since getting specific keys from an HDCA seems not implemented yet.
@davelopez davelopez marked this pull request as ready for review March 21, 2024 13:45
@davelopez
Copy link
Contributor Author

OK, if all tests pass now, it should be ready for review. There were some missing fields in some of the models. I tried to be very careful not to miss anything, but it is hard to tell. It would be great to devise a plan to make all the serialization process more homogeneous.

@@ -116,6 +116,15 @@ class DatasetCollectionPopulatedState(str, Enum):
FAILED = "failed" # some problem populating state, won't be populated


class HashFunctionNames(str, Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be duplicated for now since I couldn't find any common place to put the definition. The other enum is in lib/galaxy/util/hash_util.py

@mvdbeek mvdbeek merged commit 3675578 into galaxyproject:release_24.0 Mar 21, 2024
55 checks passed
@mvdbeek
Copy link
Member

mvdbeek commented Mar 21, 2024

Thanks for cleaning this up @davelopez!

@davelopez davelopez deleted the 24.0_fix_datasets_api_custom_keys_encoding branch March 22, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants