This repository contains a collection of Jupyter notebooks with examples of how to use the data and software distributed by Catalyst Cooperative's Public Utility Data Liberation (PUDL) project.
The easiest way to get up and running with these examples and a fresh copy of all the PUDL data is on Kaggle.
Kaggle offers substantial free computing resources and convenient data storage, so you can start playing with the PUDL data without needing to set up any software or download any data.
- PUDL Data on Kaggle
- 01 PUDL Data Access
- 02 State Hourly Electricity Demand
- 03 EIA-930 Sanity Checks
- 04 Renewable Generation Profiles
You'll find the PUDL data dictionary helpful for interpreting the data.
If you're already familiar with git, Python environments, filesystem paths, and running upyter notebooks locally, you can also work with these notebooks and the PUDL data locally:
- Create a Python environment that includes common data science packages. We like to use the mamba package manager and the conda-forge channel.
- Clone this repository.
- Download the PUDL dataset from Kaggle (it's ~20GB!) and unzip it somewhere conveniently accessible from the notebooks in the cloned repo.
- Start your JupyterLab or Jupyter Notebook server and navigate to the notebooks in the cloned repo.
- You'll need to adjust the file paths in the notebooks to point at the directory where you put the PUDL data, and might need to adjust the packages installed in your Python environment to work with the notebooks.
See the PUDL documentation for other data access methods.
If you're familiar with cloud services, you can check out:
- PUDL in the AWS Open Data Registry: s3://pudl.catalyst.coop (free access)
- Google Cloud Storage: gs://pudl.catalyst.coop (requester pays)
- https://catalyst.coop
- Email: [email protected]
- Mastodon: @[email protected]
- BlueSky: @catalyst.coop
- GitHub
- Kaggle
- HuggingFace
- Twitter: @CatalystCoop