Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom/federated data #132

Closed
jsstevenson opened this issue Jan 18, 2024 · 3 comments
Closed

Support custom/federated data #132

jsstevenson opened this issue Jan 18, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@jsstevenson
Copy link
Contributor

jsstevenson commented Jan 18, 2024

Is your feature request related to a problem? Please describe.
Our group has been using SeqRepo and other Biocommons/related tools to develop VRS mappings for MaveDB submissions. Part of this has involved adding individual experiment target sequences to SeqRepo so that they can be reused later during the VRS translation process.

Now, there are some methods to support this, but I think it's an underdeveloped use case relative to how we've been using it otherwise (i.e., syncing against a set of main snapshot sequences maintained at biocommons.org). At minimum, this process might be a little under-documented, but there's probably room for more explicit data management tooling. If I, for example, wanted to roll back to a previous checkpoint of data, I think I'd need to do so manually. Ditto for removing a specific chunk of added data (not just the most recent set of additions).

Broadly, though, @ahwagner has suggested there could be an interest in other branches of main snapshots (e.g. the Japanese reference genome), or perhaps a set of custom sequences used internally at a lab. A user might want to be able to select which reference genomes are stored in their local seqrepo and access all of them simultaneously.

Describe the solution you'd like
My very naive solution would include

  • Support for multiple simultaneous data images/snapshots under a single SeqRepo object (particularly from additional sources)
  • A little bit more support (documentation, or additional explicit code if necessary/more ergonomic) for adding and maintaining custom sequences and aliases, beyond what's given in the existing store() method
  • Management tooling for the above. E.g., deleting a chunk of data based on some provenance condition, like who submitted it, or building an exportable subset of currently-held data to be shared in a central repository.

Describe alternatives you've considered
A lot of this is possible with manual scripting on top of the existing library. We've done this already for our current MaveDB project, but - assuming we aren't the only ones interested in this kind of use case - it might be better to solidify these functions and build them into the core library.

Additional context
This is quite vague and aspirational. Happy to hear input from others.

@jsstevenson jsstevenson added the enhancement New feature or request label Jan 18, 2024
@jsstevenson
Copy link
Contributor Author

I think a lot of this is more in the "could be documented more" bucket than "needs new code". Closing this issue now, may try to come up with more specific subtopics later.

@reece
Copy link
Member

reece commented Feb 19, 2024

Okay. Also note that this is already on the roadmap for #61 and #136 .

@jsstevenson
Copy link
Contributor Author

@reece right, I progressively realized a lot of this was redundant and/or unnecessary. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants