You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In recent versions of snakemake, the app will start subinstances of snakemake under many different circumstances, including cluster submission, use of the run directive, use of shadow, and probably others. Every time these sub-instances are started, generate_inputs will be called and, if no database exists, the dataset will be indexed. For large datasets, this is a huge problem, as indexing can take an immense amount of time (even if just a few minutes, it's still multiplied over the hundreds of times generate_inputs could potentially be called). Thus, for large datasets and complex workflow, a pybids database is essentially mandatory.
We could just expect users to make their own pybids database, but for downstream users, this is inconvenient. I thus propose a mechanism for automatically making a database in the case no database is provided.
I would probably conceive of this as a snakebids "extra" (see #255), so it's something app developers would need to opt into. Basically, if no database is provided, then every time run.py is run, it will use pybids to index the dataset and put the database into the .snakebids folder (which we'd need to create) in the output dir. This database will be passed on to snakemake and generate_inputs().
Importantly, every time run.py is called, the database will be regenerated, regardless of whether it exists. This re-creates the expectation that the dataset will be re-indexed every time the app is run.
Thus, if users want a persistent database across runs, they'll still need to make their own, but within a single run, we can take advantage of the speed boosts of a database.
One concern is how exactly to make the db. On NAS, this can be extremely slow. On AllianceCan, the /tmp dir is mounted in memory, and I would presume in general that /tmp dirs are on more readily accessible storage locations. So it might be enough to always make the database on /tmp then copy it to .snakebids/.
The text was updated successfully, but these errors were encountered:
In recent versions of
snakemake
, the app will start subinstances of snakemake under many different circumstances, including cluster submission, use of therun
directive, use ofshadow
, and probably others. Every time these sub-instances are started,generate_inputs
will be called and, if no database exists, the dataset will be indexed. For large datasets, this is a huge problem, as indexing can take an immense amount of time (even if just a few minutes, it's still multiplied over the hundreds of timesgenerate_inputs
could potentially be called). Thus, for large datasets and complex workflow, a pybids database is essentially mandatory.We could just expect users to make their own pybids database, but for downstream users, this is inconvenient. I thus propose a mechanism for automatically making a database in the case no database is provided.
I would probably conceive of this as a snakebids "extra" (see #255), so it's something app developers would need to opt into. Basically, if no database is provided, then every time
run.py
is run, it will use pybids to index the dataset and put the database into the.snakebids
folder (which we'd need to create) in the output dir. This database will be passed on to snakemake andgenerate_inputs()
.Importantly, every time
run.py
is called, the database will be regenerated, regardless of whether it exists. This re-creates the expectation that the dataset will be re-indexed every time the app is run.Thus, if users want a persistent database across runs, they'll still need to make their own, but within a single run, we can take advantage of the speed boosts of a database.
One concern is how exactly to make the db. On NAS, this can be extremely slow. On AllianceCan, the
/tmp
dir is mounted in memory, and I would presume in general that/tmp
dirs are on more readily accessible storage locations. So it might be enough to always make the database on/tmp
then copy it to.snakebids/
.The text was updated successfully, but these errors were encountered: