-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow selecting subset of collection by group tag #5457
Conversation
Thanks for throwing this out there - I appreciate the conversation and the prototype to ground it! I've been thinking about this since you opened it and I'm going to continue to - there are some things really awesome here and some things that make me nervous (but I'm always nervous).
|
I think the grouping information has to be supplied during upload / when building the collection, I don't think this can be inferred realistically (Of course I could be wrong!). This morning I wrote a tool that adds tags based on a text file that maps element identifiers to tags, this is very similar to the
Yes, I think that's frequently the case (the example here is from the deseq2 iuc test data, where a file belongs to one of the "paired" and "treated" factor levels).
That depends a bit on what you would call a large number. I guess more than 4 would be uncommon, but each factor may again have multiple levels (think time course experiments).
Yes, I think that is also common since that depends on the experimental design. Currently ( |
A minor inconvenience I see is that tags don't work for anonymous users (so this is a bit harder to test), but I guess we'd make a special variant of Tag anyway if we go with this approach. |
Also tags like
|
Actually that works via the history UI, but not if I set this with #5462 ... maybe each tag needs to be set individually. Is this supposed to work ? I guess it should, given that you can add many EDIT: yep, need to flush after each new tag, |
Based on galaxyproject#5457, certain paths to creating tag values with : must already work - just not when parsing from a raw string I guess.
lib/galaxy/tools/parameters/basic.py
Outdated
@@ -2103,6 +2201,7 @@ def to_dict(self, trans, other_values=None): | |||
genomebuild=GenomeBuildParameter, | |||
select=SelectToolParameter, | |||
color=ColorToolParameter, | |||
select_tag=SelectTagParameter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't call data
parameters select_data
and this doesn't apply to all tags - so I'm going to switch this to group_tag
- I hope that is okay? I'll add an example tool, test case, and update the XSD. Let me know if there is any issue with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes more sense!
- Switch select_tag to group_tag as a parameter name - since it is restricted to group_tags and we don't call data parameters select_data. - Add minimal XSD documentation for this new parameter type. - Add example tools - for single and multiple tags. - Add API tests for these tools and the tag handling.
Merged with latest dev and pushed a commit that:
Unfortunately the multiple parameter test doesn't pass locally for me and I think it is because the tags are being reordered vs. what is sent in through the API. If that isn't stable I wonder if we should just drop "multiple=True" as an option and require a repeat? This would allow multiply selecting the same tag for applications that might require it... it would also allow multiply selecting the same tag for applications that might want to prohibit it. Not sure. I might try to take a stab at making it stable though. |
It turns out the order was made unstable by throwing things into a set to ensure supplied tags were distinct - I switched the logic here to use a list instead of set to ensure the tags are distinct and the test case passed. That said - I have no clue why we would enforce that - it feels like if the user is supplying duplicate tags in a specific order they must have a reason (not that the UI allows it - but it should I assume - same with multi-data parameters) and we should just respect that. @mvdbeek Mind if I make the switch of not enforcing the tags are distinct in from_json - or if you want to keep it mind if I hide it behind a boolean tool param attribute called |
If a group is present it should correspond to 1 or more datasets, so if a group has multiple corresponding datasets, the group would appear multiple times. I can see how it would be helpful though in other scenarios, so a |
I was I believe talking about making things distinct when the user has already selected the input and is running the tool and has somehow selected a tag more than once. I feel like you responded to making things distinct during the dropdown building. I feel like for the dropdown things are distinct and that makes perfect sense - we don't an option to change that. I also don't understand your with_replacement attribute - is that different name for it or does that do something different? |
That was still me thinking about whether a user can select the same thing twice. Which isn't specific to group tags, that probably applies to all multiple="True" select parameters. |
# Check if a dataset is selected | ||
if not history_items: | ||
return [] | ||
tags = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in practice we just make this a list, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that one is fine? I think we just drop the and tag not in tag_list
at https://github.com/galaxyproject/galaxy/pull/5457/files#diff-9baf995401cfeb779edf8731ebaf0d2dR1064.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I think it was a bit too late for me yesterday ...
Alright, then I guess this is ready for a final round of review! |
Amazing, thanks so much for pushing the platform in this direction - and explaining it to me slowly and repeatedly. This is very much needed work! |
This PR was merged without a 'kind/' tag, please correct. |
This would allow selecting tagged subsets of (nested) collections. To use this you need to tag the individual collection elements with
group:<some group name>
. All groups defined this way will be selectable in the tool form.Together with this patch against the deseq2 wrapper one can setup a deseq2 experiment from an abitrarily nested collection like this:
That's a fairly simple way to do this, I'd be curious to know if people think that something like this could work out ?
xref #740