[24.0] Fix datasets API custom keys encoding #17793

davelopez · 2024-03-19T15:35:08Z

Requires #17779
Fixes #17729 since the custom keys will be serialized correctly after this.

This is similar to #17779 but this time addressing history contents.

TODO

Add potential missing fields to models
Increase test coverage for the use of view and keys query parameters for HDAs and HDCAs.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

mvdbeek

I think that's the right thing to do!

All fields need to be explicitly declared now for HDA and HDCA responses.

Whenever we return AnyHistoryContentItem.

davelopez · 2024-03-20T16:05:05Z

I hit a bit of a roadblock with those "metadata_*" dynamic fields... I've been trying to find a clever solution but to no avail, so I will have to resort back to "extra=allow" for custom HDAs... 😞
The good news is that those fields will likely not use any encoded/decoded value so we are safe from this issue.

Follow up to galaxyproject#15505 that was missing this change.

This time for those endpoints returning HistoryContentsResult

Since the visualizations are not strictly part of the detailed view definition.

As those are dynamic and associated with particular datatypes and not common to all HDAs.

Unfortunately there is no easy way to deal with the dynamic `metadata_*` fields so we need to resort back to this configuration until we can find a better solution for those.

mvdbeek · 2024-03-21T09:45:16Z

resort back to "extra=allow" for custom HDAs.

using partial_model is still a big improvement ... in hindsight it'd have been clever to at least place the arbitrary metadata attributes under a common metadata object. I suppose it would result in a ridiculously large schema if we used create_model to create one model per datatype (which defines the custom metadata_* keys) ? 😆

davelopez · 2024-03-21T09:56:07Z

in hindsight it'd have been clever to at least place the arbitrary metadata attributes under a common metadata object.

That is exactly what I thought... is it too late to change it? I know it will be a breaking change... but having a simple:

metadata: Optional[Dict[str, Any]]

would certainly make everything cleaner and "easier to discover and handle" for clients.

mvdbeek · 2024-03-21T09:59:51Z

We could think about it for dev, but I worry that there might be scripts out there that expect stuff like metadata_bam_index in the response, meaning we'd have to add another view if we want to do this.

mvdbeek

Do you want to add a test maybe making sure that the dataset id is not an integer ?

davelopez · 2024-03-21T10:37:26Z

Yes, I'm still working on the tests, I wanted to figure out the HDCA's views and modify the models and tests accordingly.

davelopez · 2024-03-21T11:56:13Z

OK, I found out that collection serialization does not follow the same patterns as datasets, for example, the "serialization_params" are ignored, and only the view is considered. It seems to be handled here:

galaxy/lib/galaxy/managers/collections_util.py

Line 113 in ea1e84d

def dictify_dataset_collection_instance(

~~So either, there is no point in supporting partial models for those, or this is an ancient bug.~~

Update

The serialization behavior also depends on the endpoint we are hitting... there are differences between the "show" and the "index" endpoints when listing the contents of a history it will handle the serialization differently depending on the dev parameter...

galaxy/lib/galaxy/webapps/galaxy/services/history_contents.py

Line 311 in ea1e84d

if params.v == "dev":

Since getting specific keys from an HDCA seems not implemented yet.

davelopez · 2024-03-21T13:49:30Z

OK, if all tests pass now, it should be ready for review. There were some missing fields in some of the models. I tried to be very careful not to miss anything, but it is hard to tell. It would be great to devise a plan to make all the serialization process more homogeneous.

davelopez · 2024-03-21T14:07:45Z

lib/galaxy/schema/schema.py

@@ -116,6 +116,15 @@ class DatasetCollectionPopulatedState(str, Enum):
    FAILED = "failed"  # some problem populating state, won't be populated


+class HashFunctionNames(str, Enum):


This has to be duplicated for now since I couldn't find any common place to put the definition. The other enum is in lib/galaxy/util/hash_util.py

mvdbeek · 2024-03-21T17:26:42Z

Thanks for cleaning this up @davelopez!

davelopez added kind/bug kind/enhancement area/API labels Mar 19, 2024

davelopez added this to the 24.0 milestone Mar 19, 2024

mvdbeek approved these changes Mar 19, 2024

View reviewed changes

davelopez added 7 commits March 20, 2024 09:58

Add partial models for HDA and HDCA responses

476f85c

Add workaround for pydantic error with UUIDs

4acd7ba

Remove allow extra model_config in HistoryItemCommon

3b12aa1

All fields need to be explicitly declared now for HDA and HDCA responses.

Update client API schema

6817655

Set response_model_exclude_unset for datasets API

9b86214

Whenever we return AnyHistoryContentItem.

Reorder model union from less to more specific

25563b7

Add missing hashes field to HDADetailed

68a030c

davelopez force-pushed the 24.0_fix_datasets_api_custom_keys_encoding branch from 8f8865f to 68a030c Compare March 20, 2024 09:30

Add missing DRS ID field to HDADetailed schema

369ecb8

davelopez added 9 commits March 20, 2024 18:11

Move genome_build field from HDADetailed to HDASummary

4e76fe2

Follow up to galaxyproject#15505 that was missing this change.

Add response_model_exclude_unset to API endpoints

a2afe72

This time for those endpoints returning HistoryContentsResult

Relax URL validation for display apps as it can be relative

27c7861

Add missing sources field to HDADetailed schema

084754a

Move visualizations field from HDADetailed to custom model

ded9c99

Since the visualizations are not strictly part of the detailed view definition.

Refactor custom model names and union ordering

d1eb25f

Remove explicit metadata_* fields from models

e7fa65b

As those are dynamic and associated with particular datatypes and not common to all HDAs.

Use extra="allow" for custom HDA model

edeba35

Unfortunately there is no easy way to deal with the dynamic `metadata_*` fields so we need to resort back to this configuration until we can find a better solution for those.

Update client API schema

f3cb44e

Add API test for HDA serialization

100f7c0

mvdbeek approved these changes Mar 21, 2024

View reviewed changes

Remove duplicated key from view

cb67ff8

Make sure the ID is encoded in custom models

90264d6

davelopez added 2 commits March 21, 2024 14:42

Remove HDCACustom for now

e67585f

Since getting specific keys from an HDCA seems not implemented yet.

Add API tests for collection views

a14fc3e

davelopez marked this pull request as ready for review March 21, 2024 13:45

davelopez commented Mar 21, 2024

View reviewed changes

mvdbeek merged commit 3675578 into galaxyproject:release_24.0 Mar 21, 2024
55 checks passed

jdavcs mentioned this pull request Mar 21, 2024

history storage client works with unencoded IDs #17729

Closed

davelopez deleted the 24.0_fix_datasets_api_custom_keys_encoding branch March 22, 2024 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[24.0] Fix datasets API custom keys encoding #17793

[24.0] Fix datasets API custom keys encoding #17793

davelopez commented Mar 19, 2024 •

edited

Loading

mvdbeek left a comment

davelopez commented Mar 20, 2024

mvdbeek commented Mar 21, 2024

davelopez commented Mar 21, 2024

mvdbeek commented Mar 21, 2024

mvdbeek left a comment

davelopez commented Mar 21, 2024

davelopez commented Mar 21, 2024 •

edited

Loading

davelopez commented Mar 21, 2024

davelopez Mar 21, 2024

mvdbeek commented Mar 21, 2024

		@@ -116,6 +116,15 @@ class DatasetCollectionPopulatedState(str, Enum):
		FAILED = "failed" # some problem populating state, won't be populated


		class HashFunctionNames(str, Enum):

[24.0] Fix datasets API custom keys encoding #17793

[24.0] Fix datasets API custom keys encoding #17793

Conversation

davelopez commented Mar 19, 2024 • edited Loading

TODO

How to test the changes?

License

mvdbeek left a comment

Choose a reason for hiding this comment

davelopez commented Mar 20, 2024

mvdbeek commented Mar 21, 2024

davelopez commented Mar 21, 2024

mvdbeek commented Mar 21, 2024

mvdbeek left a comment

Choose a reason for hiding this comment

davelopez commented Mar 21, 2024

davelopez commented Mar 21, 2024 • edited Loading

Update

davelopez commented Mar 21, 2024

davelopez Mar 21, 2024

Choose a reason for hiding this comment

mvdbeek commented Mar 21, 2024

davelopez commented Mar 19, 2024 •

edited

Loading

davelopez commented Mar 21, 2024 •

edited

Loading