This repository has been archived by the owner on Jan 12, 2024. It is now read-only.
Automatically disable caching of local data catalog sources #3
Labels
inframundo
intake
Intake data catalogs
performance
Make data go faster by using less memory, disk, network, compute, etc.
Reading parquet files which are stored on the local filesystem through the current PUDL catalog still results in caching. This slows things down dramatically, and quickly uses an enormous amount of disk space. Especially in development when we've got data that we've just generated locally it could be nice to be working with it using the same mechanism as remote data (the data catalog), but not if we end up with a bunch of unnecessary caching happening continuously in the background.
Identify a way to disable caching when we're working with local data. Ideally this would be done automatically without the user having to think about it. Maybe it's as simple as making the
simplecache::
prefix tourlpath
conditional based on the value ofPUDL_INTAKE_PATH
using Jinja templating features?If that's not possible then maybe caching can be turned off with an argument that's passed to the data source by the user.
The text was updated successfully, but these errors were encountered: