-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename output in workflow fails on paired dataset collection #1675
Comments
I would like to high-jack this issue and raise again the general naming dataset issue. The problem now gets even more complicated that we started to use collections. Such lines https://github.com/galaxyproject/tools-devteam/blob/master/tools/bowtie2/bowtie2_wrapper.xml#L485 are only (limited) useful in single-input mode, but in collections it nearly useless. In the workflow editor we have the possibility to rename datasets with #{input_1} or similar constructs, this is doable but not user-friendly and more importantly we do not have such a mechanism for the analysis mode. Not to mention downloading datasets. These often ends up in not-usable filenames. I guess it's time to discuss this issue once and for all and fix it finally. Maybe the Galaxy team can make a start during the retreat and discuss possibilities? |
Maybe the resolution of In general, I think |
Yes yes yes This is the single largest complaint I get from users. Please could things be named using both the tool and the original input Brad On Thursday, February 18, 2016, Björn Grüning [email protected]
|
Dataset naming is the single biggest issue I have now. Here is an attempt to collect various related issues. Perhaps someone on the Galaxy team would like to create one large ticket to collect dataset naming issues (and add to the roadmap #1928?) @jmchilton, @martenson? Enhancements:
Bug Fixes:
|
+1 to get this fixed in 16.10 (and backported?) |
I'm going to skip the middle comments here - they are serious issues and they need to be addressed - it is just that we don't really know how to address them and there isn't agreement across the team or community on how to. It is too big for this particular issue. The issue here is that the GUI isn't showing you the "name" of the dataset - it is showing you the element identifier for that element in the collection. I don't consider this to be a bug - in most cases you want the element identifier and the "name" of collection items is irrelevant. If there is a rename post job action on a collection mapping step - the collection itself should probably be renamed usually instead of the items in the collection. There is a feature request issue I created for that - #1680. There should also be a way to see the dataset name in the GUI for people that want to IMO - but I doubt @carlfeberhard agrees and I can see the case against it pretty easily. tl;dr) The names have changed - we just aren't showing them. |
I'm not concerned with what is shown by the GUI, my problem is that the rename does not work for paired collections on the file name level. Rename just drops everything resulting in filenames like "Galaxy12-[].bam". With that, there's no way to know what data this was and where it came from. The current alternative of not doing the rename action results in filenames like "Galaxy9-[Bowtie2_on_data_2_and_data_1__aligned_reads_(sorted_BAM)].bam", here also no association between files and the elements shown by the GUI is possible. Being able to show the dataset name in the GUI would allow this, but it wouldn't be pretty :) I have no issue with the use of element identifiers by the GUI --- my concern here is being able to identify which set belongs to which input and that works nicely when using only the GUI. |
…ollections. xref galaxyproject#1675 This is of limited utility since we don't really expose the name - and intentionally so. Related open bugs/enhancements that still need to be addressed are: - Applying rename to the collection (in addition to the elements) - galaxyproject#1680. - Download of collection elements with element identifier instead of the name: galaxyproject#2023 / galaxyproject#2140.
#3985 fixes the downloaded name so hopefully this whole issue is now moot. As such I guess I'm going to close this as a duplicate of #2140. (If this proves not quite enough and what is actually desired is for the collection itself to be renamed by the PJA - there is another open issue #1680. Hopefully #3985 is good enough though.) |
Just to clarify, after #3985, will the post-job action on paired-collections actually work to rename the individual (usually hidden) history elements? My current issue is related to what @dmaticzka reported, though in my case trying to use a post-job rename action results in the following error:
The major annoyance is that without post-job renaming, while everything is still nicely labeled inside of collections, the element identifier isn't actually changed, so feeding a collection of mapped bam files into multiBamSummary (to use an example of a tool that uses element identifiers to label samples) still results in everything being labeled "bowtie2 on data 6 and 2" or something like that. |
@dpryan79 The element identifier shouldn't be "bowtie2 on data 6 and 2" - that would be really odd. The element identifier should be preserved from the beginning of the workflow throughout in most cases. This is the newest multiBamSummary that includes deeptools/deepTools#500? |
Yes, this the most recent version, so that's what the element identifier is actually getting set as (this also matches what the hidden history items are named as). I'm running Galaxy 17.01, so if this is changed in the upcoming 17.05 then consider me already happy :) |
@dpryan79 I'm running up to date release_17.05 (as of yesterday) and got the same error you posted above when trying a PJA rename on paired collection. |
@chambm :( |
Changing post.py:150 from: |
The "Rename dataset" workflow feature fails for paired datasest collections. When trying to rename the output using e.g. #{input_1} or #{library}.bam the filenames generated on saving as file only contain an empty string, e.g. "Galaxy3-[.bam].bam". The naming of the pairs in the output dataset collection displayed by galaxy is fine, however.
This happens for fastq-join, bowtie2 and hisat2 so it does not seem to be tool-related. For bowtie2 also reported at biostars: https://biostar.usegalaxy.org/p/14911/
The text was updated successfully, but these errors were encountered: