Automatically create pybidsdb #256

pvandyken · 2023-02-22T20:48:37Z

In recent versions of snakemake, the app will start subinstances of snakemake under many different circumstances, including cluster submission, use of the run directive, use of shadow, and probably others. Every time these sub-instances are started, generate_inputs will be called and, if no database exists, the dataset will be indexed. For large datasets, this is a huge problem, as indexing can take an immense amount of time (even if just a few minutes, it's still multiplied over the hundreds of times generate_inputs could potentially be called). Thus, for large datasets and complex workflow, a pybids database is essentially mandatory.

We could just expect users to make their own pybids database, but for downstream users, this is inconvenient. I thus propose a mechanism for automatically making a database in the case no database is provided.

I would probably conceive of this as a snakebids "extra" (see #255), so it's something app developers would need to opt into. Basically, if no database is provided, then every time run.py is run, it will use pybids to index the dataset and put the database into the .snakebids folder (which we'd need to create) in the output dir. This database will be passed on to snakemake and generate_inputs().

Importantly, every time run.py is called, the database will be regenerated, regardless of whether it exists. This re-creates the expectation that the dataset will be re-indexed every time the app is run.

Thus, if users want a persistent database across runs, they'll still need to make their own, but within a single run, we can take advantage of the speed boosts of a database.

One concern is how exactly to make the db. On NAS, this can be extremely slow. On AllianceCan, the /tmp dir is mounted in memory, and I would presume in general that /tmp dirs are on more readily accessible storage locations. So it might be enough to always make the database on /tmp then copy it to .snakebids/.

The text was updated successfully, but these errors were encountered:

pvandyken added the enhancement New feature or request label Feb 22, 2023

kaitj mentioned this issue Feb 28, 2023

BIDS validation #259

Closed

7 tasks

pvandyken mentioned this issue Mar 3, 2023

Basic Plugin Architecture #260

Closed

kaitj mentioned this issue Mar 30, 2023

Major updates to documentation #267

Merged

pvandyken added the plugin Feature suitable for, or an issue with, a snakebids plugin label Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically create pybidsdb #256

Automatically create pybidsdb #256

pvandyken commented Feb 22, 2023 •

edited

Loading

Automatically create pybidsdb #256

Automatically create pybidsdb #256

Comments

pvandyken commented Feb 22, 2023 • edited Loading

pvandyken commented Feb 22, 2023 •

edited

Loading