Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.

Automatically disable caching of local data catalog sources #3

Open
zaneselvans opened this issue Apr 6, 2022 · 1 comment
Open
Assignees
Labels
inframundo intake Intake data catalogs performance Make data go faster by using less memory, disk, network, compute, etc.

Comments

@zaneselvans
Copy link
Member

Reading parquet files which are stored on the local filesystem through the current PUDL catalog still results in caching. This slows things down dramatically, and quickly uses an enormous amount of disk space. Especially in development when we've got data that we've just generated locally it could be nice to be working with it using the same mechanism as remote data (the data catalog), but not if we end up with a bunch of unnecessary caching happening continuously in the background.

Identify a way to disable caching when we're working with local data. Ideally this would be done automatically without the user having to think about it. Maybe it's as simple as making the simplecache:: prefix to urlpath conditional based on the value of PUDL_INTAKE_PATH using Jinja templating features?

If that's not possible then maybe caching can be turned off with an argument that's passed to the data source by the user.

@zaneselvans zaneselvans added the intake Intake data catalogs label Apr 6, 2022
@zaneselvans zaneselvans added the performance Make data go faster by using less memory, disk, network, compute, etc. label Apr 7, 2022
@zaneselvans zaneselvans self-assigned this Apr 21, 2022
@zaneselvans
Copy link
Member Author

Unclear if or how we can do this, and allowing the user to specify cache_method="" is working okay, so I'm going to toss it in the icebox.

@jdangerx jdangerx moved this to 🆕 New in Catalyst Megaproject Feb 7, 2023
@jdangerx jdangerx moved this from 🆕 New to 📋 Backlog in Catalyst Megaproject Feb 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
inframundo intake Intake data catalogs performance Make data go faster by using less memory, disk, network, compute, etc.
Projects
Status: Icebox
Development

No branches or pull requests

2 participants