-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample Sheet Import of Datasets and Collections #4733
Comments
So I guess my tentative plan is to make this my main focus of the 18.01 development cycle - get through as many of the use cases as I can and open a PR when I feel something useful is ready to go. This could potentially be the most significant GUI development I've done on the project - is it worth doing all of this before we make the move to a reactive UX framework? Should I delay work on this until we have made a decision and try to work within that framework with this as one of our first use cases - or should I just model it off what is in upload now? |
For me this sounds like a new standalone app/component. Maybe you can use either Vue or React and create a single app that can be hooked up later into Galaxy? |
Is this project part of any roadmap item? |
@martenson Good question indeed! To me this half of
the other half is either #4707 or has to follow from what we learn implementing #4707 (and apply it to the workflow editor). The roadmap I'll admit is a bit flat - it doesn't weigh different things with different priorities with respect to collections. There is clearly lower hanging fruit with respect to collections on the roadmap. But I have out-of-band correspondences with the PIs where we discussed this and I think we all agreed the two biggest issues with collections are scaling toward usable large scale analyses are dealing with deeply nested collections and increasing the size of collections. I think that has been the case for a while and we used to think scaling size of collections was the bigger problem. But over 2016 you have the pairing the very nice presentation of Moe about how awesome it scales with James' frustrations with ChipSeq and Anton's discussion with Rotterdam where complexity and nesting were bigger issues. So I think the tide has turned a bit and the PIs and myself feel increasing the complexity of analyses that can be represented with collections is now the priority over scaling the size. And while it may seem like the presentation of the issue here restricts the usage of this to users who have sample sheets from their sequencing core or whatever - but I think in fact some of the later user stories are important:
So you can imagine taking any single, arbitrary directory structure of inputs and treating it as a starting column with this component. Then you can apply these rules and visualize and create nested structures from that. It would be a different view - but you'd be able to apply regex rules like with the paired list builder but then go well beyond that - nested structures, multiple collections, etc.... I think this really will have general purpose utility. |
I don't hate the idea - clearly if it was a big Python thing I'd definitely be game for that. But there is some complexity with this being a Galaxy client thing. I want it to reuse Galaxy's look and feel and the way I'm imagining it there is some interaction with the server - since the data needs to be previewed and such. I don't know that we provide a clear path for dealing with either of these things from external apps yet. This requires significant backend development also - since we probably want to be able to upload straight into collections (we are sort of hacking around that currently by putting them into the history first) and we want to build up this language of rules to apply to "sample sheets" to build collections from them - that will probably be a complex API. In order to scale - I'm hoping to avoid just loading all the data onto the client and building an explicit structure for the collection the way the collection APIs currently work. So I do appreciate the idea and I wish that we provided better mechanisms for doing that - but I'm not convinced we do currently and so I'll probably build it directly into Galaxy. It still being say a stand-alone component within the framework using Vue for instance sounds appealing - I'm not sure how to implement that but I can try? |
User Stories
This section describes user stories that progressively build up a new GUI component for creating collections from "sample sheet" inputs. This would be a two step modal (avoiding the word wizard) that would allow importing sheets of tabular data into collections of arbitrary complexity. This would allow biologists to use information potentially generated from cores directly or build structured views of their data using tools such as Excel which they are potentially most comfortable dealing with.
User Story 1
User Story 2
User Story 3
User Story 4
User Story 5
User Story 6
User Story 7
User Story 8
User Story 9
User Story 10
Future Directions:
Record Dataset Collection Types
The way paired data is described above could be extended to be used with record collection types. I would see the path forward as merging the record dataset collection commit from CWL, allow tools to describe collection types they consume, allow users to fetch these type descriptions during import here and apply rules to the columns and rows in some structured way. This would also be a way to consume certain metadata from the sheet - the record descriptions allow non-data parameters the way they do in CWL.
xref #3834
xref common-workflow-lab#71
Metadata
We need to come up with ways to think about user-supplied metadata in the context of collections and outside of records I think. I say we get this practical piece done first and then start working toward that if it is a priority.
EtherCalc
There would be a couple potential uses for a Supervisor setup that always ran an EtherCalc server beside Galaxy and some permanent bridge connecting them. This could allow users to work with sample sheet data in a more "Excel-y" way before it is even imported. This GUI described here could then follow those imports and transformations.
Other Related Issues of Interest
The text was updated successfully, but these errors were encountered: