You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the problem
TDC caches downloaded data to disk for future uses, but by default, it caches this data to a relative local directory ./data. If I then use TDC from a different directory on the same machine without specifying the previous location, it downloads the data again, unnecessarily polluting disk space.
Describe the solution you'd like
Use a "global" cache directory that is absolute for a user. It's standard practice for most applications to cache downloaded data to a hidden directory like $HOME/.cache/PACKAGE (c.f., wandb, pip, huggingface, black, etc.) by default. At runtime, a user can change this if desired and configure this default location using an environment variable (see: huggingface)
I currently have this manually implemented in my TDC client code like so:
but this is cumbersome to do everywhere. It would be nice for TDC to do this by default.
You can do this by changing the path parameter type from str to Optional[str] with a default value of None. A value of None indicates to use TDC_DATASETS_CACHE from the environment, allowing a user to (1) globally configure the default location of TDC downloads from the environment, and (2) avoid redownloading datasets every time they change directories.
The text was updated successfully, but these errors were encountered:
Describe the problem
TDC caches downloaded data to disk for future uses, but by default, it caches this data to a relative local directory
./data
. If I then use TDC from a different directory on the same machine without specifying the previous location, it downloads the data again, unnecessarily polluting disk space.Describe the solution you'd like
Use a "global" cache directory that is absolute for a user. It's standard practice for most applications to cache downloaded data to a hidden directory like
$HOME/.cache/PACKAGE
(c.f.,wandb
,pip
,huggingface
,black
, etc.) by default. At runtime, a user can change this if desired and configure this default location using an environment variable (see: huggingface)I currently have this manually implemented in my TDC client code like so:
but this is cumbersome to do everywhere. It would be nice for TDC to do this by default.
You can do this by changing the
path
parameter type fromstr
toOptional[str]
with a default value ofNone
. A value ofNone
indicates to useTDC_DATASETS_CACHE
from the environment, allowing a user to (1) globally configure the default location of TDC downloads from the environment, and (2) avoid redownloading datasets every time they change directories.The text was updated successfully, but these errors were encountered: