You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During initialization, the caller needs to specify the path to a data file. This can be tedious and annoying (especially during software testing of pygrackle or any downstream simulation code)
Description
Proposal: Introduce the feature to let users tell grackle to search a data-directory for these data files. For concreteness,
maybe we default to ~/.grackle/tables
maybe we allow people to overwrite this choice with the environment variable GRACKLE_DATA_HOME
We have 2 options with this feature:
Make this feature exclusive to pygrackle (or at least start out that way).
To support installing pygrackle from PyPI (or conda), we should realistically provide some kind of routine to download the data files to some directory.
While we could have the function download the data to an arbitrary, user-defined location, I think it would be more ergonomic to support the option of writing it to a data-directory that pygrackle knows how to check.
Support this feature in both grackle and pygrackle
To support it in grackle, we might just add a parameter called data_file_in_data_dir with a default value of 0. In that case we just maintain existing behavior. But if the parameter has a value of 1, then we treat grackle_data_file as a relative path with respect to the data-directory.
In this case, we would either need to support "installation" of data-files to the data-directory at build-time AND/OR provide a vanilla python script that can be used for this purpose.1
Considerations
I think this feature could significantly improve grackle's ergonomics. But there are 2 key considerations:
Policies of datafile-versioning and compatibility of grackle versions. Grackle's data files have very stable over the years, so I'm not terribly worried. But, it's worth considering.
For example, what do we do if we want to update an existing datafile? (Say we uncover a bug or we want to modify the format for a newer version of grackle)
Will we replace the file? Or will we replace the file and retain the old-version with a different name? Or will we introduce the new version with a new name?
I think we already adopt the last option -- which avoids most issues
What would be our policy for user-defined datafiles? Namely, what happens if our python script for downloading/installing datafiles to the data-directory encounters a name-collision.
I know that such data files are currently rare. But they do exist. And we definitely don't want to destructively overwrite someone's work
Do we simply forbid placement of custom datafiles into this directory?
or do we maybe promise to never install a datafile with the prefix "user"?
Of course, there is also the question of whether the maintenance burden is worthwhile.
Feedback
I would be greatly appreciative of any amount of feedback! (Especially on whether to just support this in pygrackle or also in grackle)
Footnotes
To avoid maintaining logic in 2 places (that needs to be consistent), this could be the same python file that is shipped as a part of pygrackle (but it probably needs to be vanilla python without any external dependencies) ↩
The text was updated successfully, but these errors were encountered:
Motivations
During initialization, the caller needs to specify the path to a data file. This can be tedious and annoying (especially during software testing of pygrackle or any downstream simulation code)
Description
Proposal: Introduce the feature to let users tell grackle to search a data-directory for these data files. For concreteness,
~/.grackle/tables
GRACKLE_DATA_HOME
We have 2 options with this feature:
pygrackle
(or at least start out that way).grackle
andpygrackle
data_file_in_data_dir
with a default value of0
. In that case we just maintain existing behavior. But if the parameter has a value of1
, then we treatgrackle_data_file
as a relative path with respect to the data-directory.Considerations
I think this feature could significantly improve grackle's ergonomics. But there are 2 key considerations:
Policies of datafile-versioning and compatibility of grackle versions. Grackle's data files have very stable over the years, so I'm not terribly worried. But, it's worth considering.
What would be our policy for user-defined datafiles? Namely, what happens if our python script for downloading/installing datafiles to the data-directory encounters a name-collision.
"user"
?Of course, there is also the question of whether the maintenance burden is worthwhile.
Feedback
I would be greatly appreciative of any amount of feedback! (Especially on whether to just support this in pygrackle or also in grackle)
Footnotes
To avoid maintaining logic in 2 places (that needs to be consistent), this could be the same python file that is shipped as a part of
pygrackle
(but it probably needs to be vanilla python without any external dependencies) ↩The text was updated successfully, but these errors were encountered: