Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Eia923 Q2 2024 Data #3768

Merged
merged 7 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ EIA 860
EIA 923
~~~~~~~
* Added EIA 923 early release data from 2023. See :issue:`3719` and PR :pr:`3721`.
* Added EIA 923 monthly data through May as part of the Q2 quarterly release. See
:issue:`3760` and :pr:`3768`.

EPA CEMS
~~~~~~~~
Expand Down
34 changes: 34 additions & 0 deletions docs/templates/eia923_child.rst.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,38 @@ Data Estimates
Plants that did not respond or reported unverified data were recorded as estimates
rolled in with the state/fuel aggregates values reported under the plant id 99999.

Boiler Fuel Primary Keys
------------------------
The :ref:`core_eia923__monthly_boiler_fuel` table has several sneaky primary keys and duplicate rows.
The main primary keys for the table are: ``plant_id_eia, boiler_id, energy_source_code, prime_mover_code,
report_date``. There are some rows that also differ based on ``associated_combined_heat_power``, due
to mid-year retirement of units that are assocated with combine heat and power systems, and
``operator_name``, due to lenient standards for string columns (the all have the same ``operator_id``
value). We drop both the ``associated_combined_heat_power`` and ``operator_name`` fields from the final
normalized table, causing duplicate rows. Luckily, these rows don't provide any conflicting information.
Because they are the same plant, when one row contains an NA value, the other contains a numeric value.
We can easily drop duplicates based on which rows contain NA values with no duplicate value reconciling
necessary.

There are still more duplicate rows with identical qualitative plant information. Luckily, none of these
duplicates contain conflicting information either. All duplicate rows have at least one row containing
solely NA and 0 values.

To address both issues at once, we drop all the duplicate rows with NA or 0 values in the non primary
key columns. One side affect of this is that duplicate rows where both rows contain NA and 0 values will
both get dropped. This leads to gaps in the data where certain months are missing. These values can be
assumed to be 0 or NA.

Boiler Fuel Years
-----------------
The :ref:`core_eia923__monthly_boiler_fuel` table reports all months in a given year, even if there is
no data. At present, we haven't truncated the data after the most recently integrated month, so you will
see all months.

Fluctuations in row count between each quarterly update are therefore due to changes in primary key
quirks as described above.




{%- endblock %}
20 changes: 10 additions & 10 deletions src/pudl/package_data/eia923/file_map.csv

Large diffs are not rendered by default.

Binary file modified src/pudl/package_data/glue/pudl_id_mapping.xlsx
Binary file not shown.
33 changes: 33 additions & 0 deletions src/pudl/package_data/glue/utility_id_pudl.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16342,3 +16342,36 @@ utility_id_pudl,utility_id_ferc1,utility_name_ferc1,utility_id_eia,utility_name_
16384,,,66290,NSF Energy One LLC
16385,,,66291,Portage Solar Plant
16386,,,66292,Desert Willow Energy Storage
16387,,,66317,"Kola Energy Storage, LLC"
16388,,,66336,"Wild Plains Wind Project, LLC"
16389,,,66352,SMT Ironman BESS LLC
16390,,,66345,"Sebree Solar, LLC"
16391,,,66354,"Anole Energy Storage, LLC"
16392,,,66351,Citadel BESS LLC
16393,,,66348,"Silver State South Storage, LLC"
16394,,,65860,"Madison Fields Solar Project, LLC"
16395,,,66346,"Silver Peak Solar, LLC"
16396,,,66314,JGT2 Energy LLC
16397,,,66318,"Zeta Solar, LLC"
16398,,,66334,Twin Lakes Solar LLC
16399,,,66319,"Heartwood Solar, LLC"
16400,,,66360,Reliability Design & Development LLC
16401,,,66320,"White Tail Solar, LLC"
16402,,,66350,Wigeon Whistle BESS LLC
16403,,,66338,Al Pastor BESS LLC
16404,,,66316,"Northumberland Solar I, LLC"
16405,,,66331,"Birch Creek Power, LLC"
16406,,,66347,"Placid Solar II, LLC"
16407,,,66335,REV Renewables LLC
16408,,,66294,"NSF Torrey Site 2, LLC"
16409,,,66300,NY CDG Genesee 1 LLC
16410,,,66321,NY CDG Montgomery 1 LLC
16411,,,66293,"NSF Torrey Site 3, LLC"
16412,,,66295,"NSF Torrey Site 1, LLC"
16413,,,66301,NY CDG Genesee 4 LLC
16414,,,66342,"Catalyze Joliet 1101 Cherry Hill Road Microgrid, LLC"
16415,,,66305,"Rio Vista Executive Boat & RV Storage, LLC"
16416,,,66304,PFMD LL Baltimore LLC
16417,,,66303,PFMD LL Jessup LLC
16418,,,66306,Town Of Cary
16419,,,66343,"Catalyze Rochelle Wiscold Drive Microgrid, LLC"
5 changes: 5 additions & 0 deletions src/pudl/transform/eia923.py
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,11 @@ def _core_eia923__boiler_fuel(raw_eia923__boiler_fuel: pd.DataFrame) -> pd.DataF
* Create a fuel_type_code_pudl field that organizes fuel types into clean,
distinguishable categories.
* Combine year and month columns into a single date column.
* Drop duplicate rows with NA or 0 in all value columns.

Eventually we should truncate this table by the last year-month that was integrated.
Right now all months get integrated for a given year, regardless of whether there's
data for them.

Args:
raw_eia923__boiler_fuel: The raw ``raw_eia923__boiler_fuel`` dataframe.
Expand Down
2 changes: 1 addition & 1 deletion src/pudl/workspace/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ class ZenodoDoiSettings(BaseSettings):
eia860: ZenodoDoi = "10.5281/zenodo.11662381"
eia860m: ZenodoDoi = "10.5281/zenodo.11110602"
eia861: ZenodoDoi = "10.5281/zenodo.10204708"
eia923: ZenodoDoi = "10.5281/zenodo.12656894"
eia923: ZenodoDoi = "10.5281/zenodo.12721286"
eia930: ZenodoDoi = "10.5281/zenodo.10840078"
eiawater: ZenodoDoi = "10.5281/zenodo.10806016"
eiaaeo: ZenodoDoi = "10.5281/zenodo.10838488"
Expand Down
14 changes: 7 additions & 7 deletions test/validate/eia_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,17 @@ def test_no_null_cols_eia(pudl_out_eia, live_dbs, cols, df_name):
@pytest.mark.parametrize(
"df_name,raw_rows,monthly_rows,annual_rows",
[
("bf_eia923", 1_642_829, 1_642_829, 135_980),
("bf_eia923", 1_642_806, 1_642_806, 135_980),
("bga_eia860", 153_487, 153_487, 153_487),
("boil_eia860", 89_051, 89_051, 89_051),
("boil_eia860", 89_050, 89_050, 89_050),
Comment on lines -49 to +51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a sense of why these went down? the bf in particular here seems wrong because there should be a few more months of data in there for the raw and monthly tables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new months of 923 data show up as data inputs into existing columns. Once we hit January there will be rows (transformed from the raw table columns) for every month, regardless of whether there is data in them. Row fluctuations are more likely due to retroactive data changes which can lead to positive and negative changes.

I dug a little deeper in to the bf table and uncovered some unexpected cruft that explains some of the row count changes. See my comment below!

("frc_eia923", 673_343, 274_479, 26_709),
("gen_eia923", None, 5_494_932, 459_711),
("gens_eia860", 590_881, 590_881, 590_881),
("gf_eia923", 3_064_042, 3_064_042, 260_842),
("gens_eia860", 591_256, 591_256, 591_256),
("gf_eia923", 3_064_045, 3_064_045, 260_842),
Comment on lines -54 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would also assume that this gf rows would go up by a few months of data

("own_eia860", 95_104, 95_104, 95_104),
("plants_eia860", 215_884, 215_884, 215_884),
("pu_eia860", 214_965, 214_965, 214_965),
("utils_eia860", 147_877, 147_877, 147_877),
("plants_eia860", 216_206, 216_206, 216_206),
("pu_eia860", 215_288, 215_288, 215_288),
("utils_eia860", 147_922, 147_922, 147_922),
("emissions_control_equipment_eia860", 62_102, 62_102, 62_102),
("denorm_emissions_control_equipment_eia860", 62_102, 62_102, 62_102),
("boiler_emissions_control_equipment_assn_eia860", 83_977, 83_977, 83_977),
Expand Down
Loading