Allow selecting subset of collection by group tag #5457

mvdbeek · 2018-02-04T18:15:58Z

This would allow selecting tagged subsets of (nested) collections. To use this you need to tag the individual collection elements with group:<some group name>. All groups defined this way will be selectable in the tool form.

Together with this patch against the deseq2 wrapper one can setup a deseq2 experiment from an abitrarily nested collection like this:

That's a fairly simple way to do this, I'd be curious to know if people think that something like this could work out ?
xref #740

jmchilton · 2018-02-05T11:03:34Z

Thanks for throwing this out there - I appreciate the conversation and the prototype to ground it! I've been thinking about this since you opened it and I'm going to continue to - there are some things really awesome here and some things that make me nervous (but I'm always nervous).

In this example, would the tags come during the initial upload/import and we would need to develop a system for deciding what non-name tags get propagated or do you expect the previous tool in the chain to annotate it outputs with tags (we could certainly add a syntax for that)?
Do the files in the different levels potentially overlap (i.e. might a file belong to multiple factors)?
Would you ever use deseq2 with a large number of factors or in workflows where the number of factors vary based on the inputs.

mvdbeek · 2018-02-05T11:33:51Z

In this example, would the tags come during the initial upload/import and we would need to develop a system for deciding what non-name tags get propagated or do you expect the previous tool in the chain to annotate it outputs with tags (we could certainly add a syntax for that)?

I think the grouping information has to be supplied during upload / when building the collection, I don't think this can be inferred realistically (Of course I could be wrong!). This morning I wrote a tool that adds tags based on a text file that maps element identifiers to tags, this is very similar to the relabel identifiers from file tool. And yes, those tags should be propagated by default, similar to the name tags.

Do the files in the different levels potentially overlap (i.e. might a file belong to multiple factors)?

Yes, I think that's frequently the case (the example here is from the deseq2 iuc test data, where a file belongs to one of the "paired" and "treated" factor levels).

Would you ever use deseq2 with a large number of factors

That depends a bit on what you would call a large number. I guess more than 4 would be uncommon, but each factor may again have multiple levels (think time course experiments).

or in workflows where the number of factors vary based on the inputs.

Yes, I think that is also common since that depends on the experimental design. Currently (~~and with this proposition as well~~) we have to modify each workflow if the experimental design is changed (i.e we would need to add or remove factors and factor levels in the deseq2 tool). Still with this approach we at least don't need to re-wire the connections in the workflow as we have to do with multiple flat collections. In fact since now the inputs are unchanged this can simply be runtime parameters.

mvdbeek · 2018-02-05T14:10:51Z

A minor inconvenience I see is that tags don't work for anonymous users (so this is a bit harder to test), but I guess we'd make a special variant of Tag anyway if we go with this approach.

mvdbeek · 2018-02-06T15:24:36Z

Also tags like group:treatment and group:day1 are not supposed to co-exist on the same dataset, I guess. It seems to work on sqlite, but not with postgresql:

IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "tag_name_key"
DETAIL:  Key (name)=(group) already exists.
 [SQL: 'INSERT INTO tag (type, parent_id, name) VALUES (%(type)s, %(parent_id)s, %(name)s) RETURNING tag.id'] [parameters: {'parent_id': None, 'type': 0, 'name': u'group'}]
galaxy.tools.execute WARNING 2018-02-06 15:22:38,808 [p:2122,w:1,m:0] [uWSGIWorker1Core1] There was a failure executing a job for tool [__TAG_FROM_FILE__] - Error executing tool: (psycopg2.IntegrityError) duplicate key value violates unique constraint "tag_name_key"

mvdbeek · 2018-02-06T16:04:41Z

Actually that works via the history UI, but not if I set this with #5462 ... maybe each tag needs to be set individually. Is this supposed to work ? I guess it should, given that you can add many name tags ?!

EDIT: yep, need to flush after each new tag,

Based on galaxyproject#5457, certain paths to creating tag values with : must already work - just not when parsing from a raw string I guess.

jmchilton · 2018-07-24T17:22:19Z

lib/galaxy/tools/parameters/basic.py

@@ -2103,6 +2201,7 @@ def to_dict(self, trans, other_values=None):
    genomebuild=GenomeBuildParameter,
    select=SelectToolParameter,
    color=ColorToolParameter,
+    select_tag=SelectTagParameter,


We don't call data parameters select_data and this doesn't apply to all tags - so I'm going to switch this to group_tag - I hope that is okay? I'll add an example tool, test case, and update the XSD. Let me know if there is any issue with this.

Yes, that makes more sense!

- Switch select_tag to group_tag as a parameter name - since it is restricted to group_tags and we don't call data parameters select_data. - Add minimal XSD documentation for this new parameter type. - Add example tools - for single and multiple tags. - Add API tests for these tools and the tag handling.

jmchilton · 2018-07-24T18:41:52Z

Merged with latest dev and pushed a commit that:

Switches select_tag to group_tag as a parameter name - discussed above.
Adds minimal XSD documentation for this new parameter type.
Adds example tools - for single and multiple tags.
Add API tests for these tools and the tag handling.

Unfortunately the multiple parameter test doesn't pass locally for me and I think it is because the tags are being reordered vs. what is sent in through the API. If that isn't stable I wonder if we should just drop "multiple=True" as an option and require a repeat? This would allow multiply selecting the same tag for applications that might require it... it would also allow multiply selecting the same tag for applications that might want to prohibit it. Not sure.

I might try to take a stab at making it stable though.

jmchilton · 2018-07-25T16:04:10Z

It turns out the order was made unstable by throwing things into a set to ensure supplied tags were distinct - I switched the logic here to use a list instead of set to ensure the tags are distinct and the test case passed. That said - I have no clue why we would enforce that - it feels like if the user is supplying duplicate tags in a specific order they must have a reason (not that the UI allows it - but it should I assume - same with multi-data parameters) and we should just respect that.

@mvdbeek Mind if I make the switch of not enforcing the tags are distinct in from_json - or if you want to keep it mind if I hide it behind a boolean tool param attribute called distinct? After that I'm +1 on this PR now.

mvdbeek · 2018-08-06T19:51:10Z

it feels like if the user is supplying duplicate tags in a specific order they must have a reason

If a group is present it should correspond to 1 or more datasets, so if a group has multiple corresponding datasets, the group would appear multiple times.
At least in the differential expression example that wouldn't be helpful.

I can see how it would be helpful though in other scenarios, so a distinct attribute sounds good to me. I wonder if a with_replacement might be helpful as well, so you could pick as many times as you wanted if with_replacement was true ? Or is that the default and I want without_replacement 😕 ? Or should we go stepwise and see if we actually need it ?

jmchilton · 2018-08-06T20:17:47Z

If a group is present it should correspond to 1 or more datasets, so if a group has multiple corresponding datasets, the group would appear multiple times.

I was I believe talking about making things distinct when the user has already selected the input and is running the tool and has somehow selected a tag more than once. I feel like you responded to making things distinct during the dropdown building. I feel like for the dropdown things are distinct and that makes perfect sense - we don't an option to change that.

I also don't understand your with_replacement attribute - is that different name for it or does that do something different?

mvdbeek · 2018-08-06T20:20:50Z

I was I believe talking about making things distinct when the user has already selected the input and is running the tool and has somehow selected a tag more than once.

~~Oh, yeah, that we shouldn't do.~~
I mean, we shouldn't make it distinct if a user explicitly chose the same tag multiple times.

I also don't understand your with_replacement attribute - is that different name for it or does that do something different?

That was still me thinking about whether a user can select the same thing twice. Which isn't specific to group tags, that probably applies to all multiple="True" select parameters.

mvdbeek · 2018-08-06T20:23:01Z

lib/galaxy/tools/parameters/basic.py

+        # Check if a dataset is selected
+        if not history_items:
+            return []
+        tags = set()


So in practice we just make this a list, right ?

I think that one is fine? I think we just drop the and tag not in tag_list at https://github.com/galaxyproject/galaxy/pull/5457/files#diff-9baf995401cfeb779edf8731ebaf0d2dR1064.

Oh right, I think it was a bit too late for me yesterday ...

mvdbeek · 2018-08-07T15:12:05Z

Alright, then I guess this is ready for a final round of review!

jmchilton · 2018-08-07T17:11:19Z

Amazing, thanks so much for pushing the platform in this direction - and explaining it to me slowly and repeatedly. This is very much needed work!

galaxybot · 2018-08-07T18:02:51Z

This PR was merged without a 'kind/' tag, please correct.

WIP: add a SelectTagParameter

fa3a237

mvdbeek added area/UI-UX area/tools status/WIP status/planning labels Feb 4, 2018

Fix unit tests

4afbdba

mvdbeek mentioned this pull request Feb 6, 2018

Tool that adds/sets tags for collection elements from a file #5462

Merged

Make get_datasets_for_group case insensitive to tag

04b0cfc

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jul 12, 2018

Allow : in tag values for PJA setting.

b069320

Based on galaxyproject#5457, certain paths to creating tag values with : must already work - just not when parsing from a raw string I guess.

This was referenced Jul 12, 2018

Implement group: tags (toward multi-factor analysis with group tagging) #6491

Merged

Allow setting tags on targets & contents in data fetch API. #6499

Merged

Merge remote-tracking branch 'jmchilton/dev' into select_tag_parameter

45b2b7f

jmchilton reviewed Jul 24, 2018

View reviewed changes

jmchilton added 2 commits July 25, 2018 11:53

Make tag order stable when using group_tag parameters.

e5c5322

Fix failing import/export test for group_tag changes.

6677e7b

jmchilton mentioned this pull request Jul 26, 2018

Allow Consuming Tags in the Apply Rules Tool #6545

Merged

mvdbeek commented Aug 6, 2018

View reviewed changes

mvdbeek added 2 commits August 7, 2018 17:07

Keep all user-selected tags, including duplicates

5a914b9

Correct copy-pasted docstring

95e8a34

mvdbeek added status/review and removed status/WIP status/planning labels Aug 7, 2018

mvdbeek changed the title ~~RFC: allow selecting subset of collection by group tag~~ Allow selecting subset of collection by group tag Aug 7, 2018

jmchilton approved these changes Aug 7, 2018

View reviewed changes

galaxybot added this to the 18.09 milestone Aug 7, 2018

jmchilton merged commit 55436c3 into galaxyproject:dev Aug 7, 2018

nsoranzo added the kind/feature label Aug 7, 2018

nsoranzo deleted the select_tag_parameter branch August 7, 2018 21:53

mvdbeek mentioned this pull request Mar 31, 2023

[Feature Request] TAG_FROM_FILE needs to support looking at parent element identifiers in addition #15880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow selecting subset of collection by group tag #5457

Allow selecting subset of collection by group tag #5457

mvdbeek commented Feb 4, 2018

jmchilton commented Feb 5, 2018

mvdbeek commented Feb 5, 2018 •

edited

Loading

mvdbeek commented Feb 5, 2018

mvdbeek commented Feb 6, 2018

mvdbeek commented Feb 6, 2018 •

edited

Loading

jmchilton Jul 24, 2018

mvdbeek Jul 24, 2018

jmchilton commented Jul 24, 2018 •

edited

Loading

jmchilton commented Jul 25, 2018

mvdbeek commented Aug 6, 2018 •

edited

Loading

jmchilton commented Aug 6, 2018

mvdbeek commented Aug 6, 2018 •

edited

Loading

mvdbeek Aug 6, 2018

jmchilton Aug 7, 2018

mvdbeek Aug 7, 2018

mvdbeek commented Aug 7, 2018

jmchilton commented Aug 7, 2018

galaxybot commented Aug 7, 2018

Allow selecting subset of collection by group tag #5457

Allow selecting subset of collection by group tag #5457

Conversation

mvdbeek commented Feb 4, 2018

jmchilton commented Feb 5, 2018

mvdbeek commented Feb 5, 2018 • edited Loading

mvdbeek commented Feb 5, 2018

mvdbeek commented Feb 6, 2018

mvdbeek commented Feb 6, 2018 • edited Loading

jmchilton Jul 24, 2018

Choose a reason for hiding this comment

mvdbeek Jul 24, 2018

Choose a reason for hiding this comment

jmchilton commented Jul 24, 2018 • edited Loading

jmchilton commented Jul 25, 2018

mvdbeek commented Aug 6, 2018 • edited Loading

jmchilton commented Aug 6, 2018

mvdbeek commented Aug 6, 2018 • edited Loading

mvdbeek Aug 6, 2018

Choose a reason for hiding this comment

jmchilton Aug 7, 2018

Choose a reason for hiding this comment

mvdbeek Aug 7, 2018

Choose a reason for hiding this comment

mvdbeek commented Aug 7, 2018

jmchilton commented Aug 7, 2018

galaxybot commented Aug 7, 2018

mvdbeek commented Feb 5, 2018 •

edited

Loading

mvdbeek commented Feb 6, 2018 •

edited

Loading

jmchilton commented Jul 24, 2018 •

edited

Loading

mvdbeek commented Aug 6, 2018 •

edited

Loading

mvdbeek commented Aug 6, 2018 •

edited

Loading