-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain all harvestable fields during EIA transforms #509
Comments
@cmgosnell and I are going to help get @knordback working on this issue as a way to become more familiar with the harvesting process, working with our code, Jupyter, etc. |
… field may or may not actually want to change
@cmgosnell while talking over some of these fields with @knordback yesterday, I noticed that the Are these different attributes? Should there be a CHP field at both the generator and the plant level? Should this really be a permanent attribute, or is it another one that changes slowly? Does the generator field really just indicate that the generator is part of a plant that does CHP? Or that it's part of a generation unit that does CHP? Could the plant or plant-prime-fuel level CHP status be inferred from the generator-level CHP attributes? Right now we're discarding the CHP column reported in @grgmiller or @gschivley do either of you have more context on the relationship between these two different CHP fields? |
I don't know exactly. |
It seems like we should probably do an exhaustive check of all the currently "permanent" generator attributes on the pre-harvested dataframes... and see how permanent they actually are. |
I do not have any context on these two fields. |
I'll hold off on this one for now. |
I think this is mostly done. Based on notes above I left in code dropping some of the fields in clean_generation_fuel_eia923() and clean_fuel_receipts_costs_eia923(), but I'm not certain I'm interpreting the notes correctly. There's also implicit dropping in plants_eia923(), and I don't know if that's as desired or not. |
In many of our older EIA transformation functions, we preemptively drop columns from the tables that are being processed, in order to produce normalized tables. However, many of these columns contain information about the entities (plants, generators, utilities) that should be integrated into the entity harvesting and resolution process, which happens after the transform step.
Discarded Columns
pudl.metadata.fields
column_map.csv
undersrc/pudl/package_data/{data source}/
so that it matches the DB schema.total_fuel_consumption_mmbtu
is an annual total of monthly values that are retained, and so we don't need it.EIA-860
pudl.transform.eia860.ownership()
pudl.transform.eia860.generators()
pudl.transform.eia860.plants()
pudl.transform.eia860.utilities()
EIA-923
pudl.transform.eia923.plants()
pudl.transform.eia923.generation_fuel()
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
fuel_unit
(should probably be dropped, since unit is implied by fuel type)total_fuel_consumption_quantity
(annual total?)electric_fuel_consumption_quantity
(annual total?)total_fuel_consumption_mmbtu
(annual total?)elec_fuel_consumption_mmbtu
(annual total?)net_generation_megawatthours
(annual total?)early_release
pudl.transform.eia923.boiler_fuel()
This one may give you trouble. See #1847 and #1836.
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
fuel_unit
(should probably be dropped, since unit is implied by fuel type)total_fuel_consumption_quantity
(annual total?)balancing_authority_code_eia
early_release
reporting_frequency_code
data_maturity
(WE add this field in the extraction... getting dropped b/c of aggregations. See enable non-data columns in aggregated boiler_fuel_eia923 table #1847)pudl.transform.eia923.generation()
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
early_release
pudl.transform.eia923.coalmine()
pudl.transform.eia923.fuel_receipts_costs()
plant_name_eia
plant_state
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)mine_id_msha
(should be dropped)mine_type_code
(should be dropped)state
(of the mine?)county_id_fips
(of the mine?)state_id_fips
(of the mine?)mine_name
(should be dropped)regulated
(mine or plant?)early_release
The text was updated successfully, but these errors were encountered: