You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Our group has been using SeqRepo and other Biocommons/related tools to develop VRS mappings for MaveDB submissions. Part of this has involved adding individual experiment target sequences to SeqRepo so that they can be reused later during the VRS translation process.
Now, there are some methods to support this, but I think it's an underdeveloped use case relative to how we've been using it otherwise (i.e., syncing against a set of main snapshot sequences maintained at biocommons.org). At minimum, this process might be a little under-documented, but there's probably room for more explicit data management tooling. If I, for example, wanted to roll back to a previous checkpoint of data, I think I'd need to do so manually. Ditto for removing a specific chunk of added data (not just the most recent set of additions).
Broadly, though, @ahwagner has suggested there could be an interest in other branches of main snapshots (e.g. the Japanese reference genome), or perhaps a set of custom sequences used internally at a lab. A user might want to be able to select which reference genomes are stored in their local seqrepo and access all of them simultaneously.
Describe the solution you'd like
My very naive solution would include
Support for multiple simultaneous data images/snapshots under a single SeqRepo object (particularly from additional sources)
A little bit more support (documentation, or additional explicit code if necessary/more ergonomic) for adding and maintaining custom sequences and aliases, beyond what's given in the existing store() method
Management tooling for the above. E.g., deleting a chunk of data based on some provenance condition, like who submitted it, or building an exportable subset of currently-held data to be shared in a central repository.
Describe alternatives you've considered
A lot of this is possible with manual scripting on top of the existing library. We've done this already for our current MaveDB project, but - assuming we aren't the only ones interested in this kind of use case - it might be better to solidify these functions and build them into the core library.
Additional context
This is quite vague and aspirational. Happy to hear input from others.
The text was updated successfully, but these errors were encountered:
I think a lot of this is more in the "could be documented more" bucket than "needs new code". Closing this issue now, may try to come up with more specific subtopics later.
Is your feature request related to a problem? Please describe.
Our group has been using SeqRepo and other Biocommons/related tools to develop VRS mappings for MaveDB submissions. Part of this has involved adding individual experiment target sequences to SeqRepo so that they can be reused later during the VRS translation process.
Now, there are some methods to support this, but I think it's an underdeveloped use case relative to how we've been using it otherwise (i.e., syncing against a set of main snapshot sequences maintained at biocommons.org). At minimum, this process might be a little under-documented, but there's probably room for more explicit data management tooling. If I, for example, wanted to roll back to a previous checkpoint of data, I think I'd need to do so manually. Ditto for removing a specific chunk of added data (not just the most recent set of additions).
Broadly, though, @ahwagner has suggested there could be an interest in other branches of main snapshots (e.g. the Japanese reference genome), or perhaps a set of custom sequences used internally at a lab. A user might want to be able to select which reference genomes are stored in their local seqrepo and access all of them simultaneously.
Describe the solution you'd like
My very naive solution would include
SeqRepo
object (particularly from additional sources)store()
methodDescribe alternatives you've considered
A lot of this is possible with manual scripting on top of the existing library. We've done this already for our current MaveDB project, but - assuming we aren't the only ones interested in this kind of use case - it might be better to solidify these functions and build them into the core library.
Additional context
This is quite vague and aspirational. Happy to hear input from others.
The text was updated successfully, but these errors were encountered: