-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rule Builder / Uploader - Future Steps #5381
Comments
I see you've got a point here called "Apply Collection Builder to a Whole History" which sounds very exciting. Exciting, because the vast majority of our users' data is in Shared Data, as that's where the data from our sequencers is e.g.: Shared Data > fastq > user.name > analysis.type > run.id > samples So currently our users typically need to send the sample fastqs from the run.id folder to a history and then build into collections. The building of each collection can be a bit tedious when there're many groups so wondering if that's something that this rule builder could soon be used for instead? Or if there'll be another way to create collections from Shared Data e.g. this "Integrate with library import folder" I see here mentioned here: #5822 |
@mblue9 So the data is loaded into data libraries already right? You should be able to export collections right from library folders to histories in 18.01 (#5080) - have you tried that yet? Seems that would be even easier in simple cases - or am I not understanding what you are hoping for? What I mentioned here is loading folders not yet in Galaxy but on disk (e.g. |
Yes
No I hadn't (had missed that option). Our production instance with the Shared Data is still 17.09, we're working on upgrading to 18.01. But I did attempt to try it out just now in https://test.galaxyproject.org
I don't think so, or am I'm missing the 'quick' way here? As, even with the ability to send to Collections from the Shared Data, that looks like it's still going to require a lot of clicks, much more than the Rule Builder. For example, if I have 6 groups with 3 reps each (not an uncommon scenario and some people have many more e.g. multiple treatments and timepoints), then in the Shared Data I have to tick to select the 3 reps (3 clicks), then click "To History", click "as a Collection", click "Continue". And then repeat that for all the groups. So looks like that would require ~36 clicks in total, just to create the collections (and that's not including adding the hashtags). Whereas with the Rule Builder, I can paste in a samplesheet, select "Upload data as Collections", click "Build", add the few column Definitions, then click "Upload" and.....voila.... all collections automatically created (and looks as simple to create collections for 50 groups as it is for 6). That's a much easier and nicer way imho. And after trying the amazing new Rule Builder how can I now go back to all the clicking 😞 |
I see, I had not considered creating many collections at once. Thanks for the clarifications. I'll try to get the rule builder hooked into the library stuff - it shouldn't be too bad, I just need to find some time. |
Mostly done - I'll create a new issue for "Apply Rules" to a history. |
Smaller tweaks and issues are covered by #3916. This outlines some bigger directions I'd like to take the uploader / collection builder.
This request is probably going come very quickly as people need to deal with nested collections. We need to be able to sort of reshape them during an analysis - pool things in one part and unpool them in other parts, filter on identifiers in different parts of the collection, split the nested collections into sub-collections in various ways. I had originally planned to do this as a generalization of the expression-based grouping tool rejected during the original collection operation PR (953cdf2), but I think this rule-based language and GUI could really do all those things with very quickly based on what is there.
We could load all the datasets of the collection into into the builder, have columns for every layer of list identifier and order index, grab extra HDA metadata already available and future collection metadata if we collection. Researchers could then interactively re-shape the collection, merge and split identifiers, filter, break it up into separate collections, etc....
To use this collection reorganization application in a workflow - we could allow dumping the JSON describing the rules out and re-using it from a stand-alone collection operation tool.
We could track these operations in the database and capture the seemingly interactive re-organizations as tool steps using this need tool when being applied to workflows during history extraction. What will feel like interactive re-organization of the collection will actually be the user writing a batch program executable on the backend.
The text was updated successfully, but these errors were encountered: