-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept for GH bug #509 #2169
Conversation
… field may or may not actually want to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new combined_heat_power
field that you've added is data that we were entirely losing before, and that should be associated with the boilers directly (which seems totally possible) then this looks generally correct.
However, I think this case is separate from the issue that #509 is attempting to address, which is that there are many fields that are reported independently in various EIA tables, which we want to deduplicate through the harvesting process. Unfortunately in many of the table-level data cleaning functions we are currently preemptively dropping them, so they never make it to the harvesting process at all. This point of this issue, IIRC, is to go find all of those cases where columns we actually want to harvest and preserve in the appropriate table are getting dropped in their transform step. In many cases, that will just mean... not dropping them. But in some cases not dropping them will cause problems later on in the transform (e.g. the one you found in _aggregate_duplicate_boiler_fuel_keys()
) and those problems will have to be chased down and addressed.
In the case of existing values that we're dropping, which already have a home table & metadata in the database schema, no changes to the fields or resource (table) definitions should be required -- it's just that additional information will be getting fed into the harvesting process to populate those existing fields.
@@ -263,6 +263,10 @@ | |||
"description": "Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier's system reaches its monthly peak. In megawatts.", | |||
"unit": "MW", | |||
}, | |||
"combined_heat_power": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a different field than associated_combined_heat_power
or do they represent the same characteristic applied to generators vs. boilers?
@@ -15,6 +15,7 @@ | |||
"fuel_mmbtu_per_unit", | |||
"sulfur_content_pct", | |||
"ash_content_pct", | |||
"combined_heat_power", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is truly a new field that we were inappropriately dropping (and losing entirely) before, and we want to have it independently associated with the boilers (in addition to the associated_combined_heat_power
field that is applied to generators) then this seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think the point is that #509 is specific to generators. I hadn't been thinking about generators vs. boilers specifically. I'll do the analogous thing just for generators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right to me. #509 (the issue) is very generic, even if it provides an example from the generators.
We're doing dropping potentially informational columns across a bunch of the EIA table transforms, and would prefer to feed them into the harvesting process so they can be used to more appropriately set the boiler / generator / utility / plant / balancing authority attributes (static or annual).
Addressing #509 should not have any effect on the database schema -- if there are columns in the wrong place, or columns we have failed to integrate entirely, those should probably be their own separate issues. This issue should just stop dropping columns prior to the harvesting process, so they can be used within it.
@knordback I'm going to close this draft PR since it's off a fork, and I think we mostly learned what the prototypical solution will look like in #2200. |
PR Overview
PR Checklist
dev
).