Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically create pybidsdb #256

Open
pvandyken opened this issue Feb 22, 2023 · 0 comments
Open

Automatically create pybidsdb #256

pvandyken opened this issue Feb 22, 2023 · 0 comments
Labels
enhancement New feature or request plugin Feature suitable for, or an issue with, a snakebids plugin

Comments

@pvandyken
Copy link
Contributor

pvandyken commented Feb 22, 2023

In recent versions of snakemake, the app will start subinstances of snakemake under many different circumstances, including cluster submission, use of the run directive, use of shadow, and probably others. Every time these sub-instances are started, generate_inputs will be called and, if no database exists, the dataset will be indexed. For large datasets, this is a huge problem, as indexing can take an immense amount of time (even if just a few minutes, it's still multiplied over the hundreds of times generate_inputs could potentially be called). Thus, for large datasets and complex workflow, a pybids database is essentially mandatory.

We could just expect users to make their own pybids database, but for downstream users, this is inconvenient. I thus propose a mechanism for automatically making a database in the case no database is provided.

I would probably conceive of this as a snakebids "extra" (see #255), so it's something app developers would need to opt into. Basically, if no database is provided, then every time run.py is run, it will use pybids to index the dataset and put the database into the .snakebids folder (which we'd need to create) in the output dir. This database will be passed on to snakemake and generate_inputs().

Importantly, every time run.py is called, the database will be regenerated, regardless of whether it exists. This re-creates the expectation that the dataset will be re-indexed every time the app is run.

Thus, if users want a persistent database across runs, they'll still need to make their own, but within a single run, we can take advantage of the speed boosts of a database.

One concern is how exactly to make the db. On NAS, this can be extremely slow. On AllianceCan, the /tmp dir is mounted in memory, and I would presume in general that /tmp dirs are on more readily accessible storage locations. So it might be enough to always make the database on /tmp then copy it to .snakebids/.

@pvandyken pvandyken added the enhancement New feature or request label Feb 22, 2023
@kaitj kaitj mentioned this issue Feb 28, 2023
7 tasks
@pvandyken pvandyken added the plugin Feature suitable for, or an issue with, a snakebids plugin label Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin Feature suitable for, or an issue with, a snakebids plugin
Projects
None yet
Development

No branches or pull requests

1 participant