Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove outliers and impute missing values in EIA 930 hourly demand #3894

Open
10 tasks
zaneselvans opened this issue Oct 2, 2024 · 1 comment
Open
10 tasks
Assignees
Labels
data-repair Interpolating or extrapolating data that we don't actually have. eia930 Related to the EIA Form 930 epic Any issue whose primary purpose is to organize other issues into a group.

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Oct 2, 2024

Description

Apply the same outlier / error detection heuristics and correlated time series imputation methods that we currently use on the FERC-714 hourly demand data to produce a complete and plausible version of the EIA 930 hourly demand data.

Motivation

  • In our processing of the FERC 714 hourly demand data we use some outlier / error detection heuristics originally developed by @truggles for application to the EIA 930 hourly demand data.
  • In the spring of 2024 we integrated the EIA 930 data, but are currently only publishing a relatively untouched version of it, since folks using it for resource adequacy want to be able to do their own outlier detection (as outliers drive RA requirements / failures).
  • Now @awongel and others are interested in having a cleaned / imputed version of the EIA 930 for other modeling purposes.
  • Catalyst has resources from NSF POSE to support open source contributors, and this seems like a great opportunity for collaboration, and incremental improvement of PUDL, that will create a resource of lasting value to a wider base of users.

Scope

In Scope

  • Apply Tyler Ruggles' heuristics to the EIA-930 data to identify outliers and errors, and null them out.
  • Run the correlated time series imputation code that we currently apply to the FERC 714 hourly demand data on all 3 of the hourly demand / production time series that come from the EIA 930, producing 3 new tables derived from:
    • core_eia930__hourly_net_generation_by_energy_source
    • core_eia930__hourly_operations
    • core_eia930__hourly_subregion_demand
  • The correlated time series imputation seems like it won't be applicable to the hourly interchange data, since it's not necessarily going to be correlated with anything else -- it could vary totally independent of actual demand. Also, we know it doesn't reconcile the generation / demand differences well everywhere, and the modifications from the time series imputation should be pretty minor, so we might not want to touch it at all.
  • Ideally, we should end up with some generic code that can be used on both the EIA 930 and the FERC 714, with minimal duplication. The trick seems like it'll be organizing the EIA 930 data appropriately to be processed by the existing code (which is already somewhat modular and should be reusable... but we'll see!)

Out of Scope

  • This issue is not related to applying @jdechalendar's reconciliation code to the EIA 930 data. We still might want to do that later, but I think it would build upon the outputs being generated here.

Tasks

@zaneselvans zaneselvans added data-repair Interpolating or extrapolating data that we don't actually have. eia930 Related to the EIA Form 930 labels Oct 2, 2024
@zaneselvans zaneselvans added community epic Any issue whose primary purpose is to organize other issues into a group. labels Oct 2, 2024
@zaneselvans
Copy link
Member Author

  • The out_ferc714__hourly_planning_area_demand table that we produce has not been cleaned up.
  • All the time series cleaning work is currently done in the preparation of the out_ferc714__hourly_estimated_state_demand output table.
  • So we don't currently have an exact analog for the cleaned EIA 930 timeseries -- at least not that's being written out and distributed.
  • The ephemeral _out_ferc714__hourly_imputed_demand asset looks like the closest thing.
  • So the high level machinery that's doing the nulling and imputation is currently FERC 714 specific, but is making use of lower level functions from the timeseries_cleaning.py module, and the FERC 714 functions are pretty short, so if there are any differences between the FERC 714 and EIA 930 data (like the non-demand columns that need to be preserved) maybe it makes sense to create analogous functions that are EIA 930 specific.

@zaneselvans zaneselvans changed the title Remove outliers and impute missing values in EIA 930 hourly electricity demand time series Remove outliers and impute missing values in EIA 930 hourly demand Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-repair Interpolating or extrapolating data that we don't actually have. eia930 Related to the EIA Form 930 epic Any issue whose primary purpose is to organize other issues into a group.
Projects
Status: Backlog
Development

No branches or pull requests

1 participant