Support custom/federated data #132

jsstevenson · 2024-01-18T15:49:49Z

Is your feature request related to a problem? Please describe.
Our group has been using SeqRepo and other Biocommons/related tools to develop VRS mappings for MaveDB submissions. Part of this has involved adding individual experiment target sequences to SeqRepo so that they can be reused later during the VRS translation process.

Now, there are some methods to support this, but I think it's an underdeveloped use case relative to how we've been using it otherwise (i.e., syncing against a set of main snapshot sequences maintained at biocommons.org). At minimum, this process might be a little under-documented, but there's probably room for more explicit data management tooling. If I, for example, wanted to roll back to a previous checkpoint of data, I think I'd need to do so manually. Ditto for removing a specific chunk of added data (not just the most recent set of additions).

Broadly, though, @ahwagner has suggested there could be an interest in other branches of main snapshots (e.g. the Japanese reference genome), or perhaps a set of custom sequences used internally at a lab. A user might want to be able to select which reference genomes are stored in their local seqrepo and access all of them simultaneously.

Describe the solution you'd like
My very naive solution would include

Support for multiple simultaneous data images/snapshots under a single SeqRepo object (particularly from additional sources)
A little bit more support (documentation, or additional explicit code if necessary/more ergonomic) for adding and maintaining custom sequences and aliases, beyond what's given in the existing store() method
Management tooling for the above. E.g., deleting a chunk of data based on some provenance condition, like who submitted it, or building an exportable subset of currently-held data to be shared in a central repository.

Describe alternatives you've considered
A lot of this is possible with manual scripting on top of the existing library. We've done this already for our current MaveDB project, but - assuming we aren't the only ones interested in this kind of use case - it might be better to solidify these functions and build them into the core library.

Additional context
This is quite vague and aspirational. Happy to hear input from others.

The text was updated successfully, but these errors were encountered:

jsstevenson · 2024-02-18T13:09:54Z

I think a lot of this is more in the "could be documented more" bucket than "needs new code". Closing this issue now, may try to come up with more specific subtopics later.

reece · 2024-02-19T03:08:39Z

Okay. Also note that this is already on the roadmap for #61 and #136 .

jsstevenson · 2024-02-19T15:18:53Z

@reece right, I progressively realized a lot of this was redundant and/or unnecessary. Thanks!

jsstevenson added the enhancement New feature or request label Jan 18, 2024

jsstevenson closed this as completed Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom/federated data #132

Support custom/federated data #132

jsstevenson commented Jan 18, 2024 •

edited

Loading

jsstevenson commented Feb 18, 2024

reece commented Feb 19, 2024

jsstevenson commented Feb 19, 2024

Support custom/federated data #132

Support custom/federated data #132

Comments

jsstevenson commented Jan 18, 2024 • edited Loading

jsstevenson commented Feb 18, 2024

reece commented Feb 19, 2024

jsstevenson commented Feb 19, 2024

jsstevenson commented Jan 18, 2024 •

edited

Loading