Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept for GH bug #509 #2169

Closed
wants to merge 2 commits into from

Conversation

knordback
Copy link
Collaborator

@knordback knordback commented Dec 31, 2022

PR Overview

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

… field may or may not actually want to change
@zaneselvans zaneselvans linked an issue Jan 5, 2023 that may be closed by this pull request
50 tasks
@zaneselvans zaneselvans changed the title Proof of concept for GH bug #509. This particular field may or may no… Proof of concept for GH bug #509. Jan 5, 2023
@zaneselvans zaneselvans changed the title Proof of concept for GH bug #509. Proof of concept for GH bug #509 Jan 5, 2023
@zaneselvans zaneselvans self-requested a review January 5, 2023 17:51
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new combined_heat_power field that you've added is data that we were entirely losing before, and that should be associated with the boilers directly (which seems totally possible) then this looks generally correct.

However, I think this case is separate from the issue that #509 is attempting to address, which is that there are many fields that are reported independently in various EIA tables, which we want to deduplicate through the harvesting process. Unfortunately in many of the table-level data cleaning functions we are currently preemptively dropping them, so they never make it to the harvesting process at all. This point of this issue, IIRC, is to go find all of those cases where columns we actually want to harvest and preserve in the appropriate table are getting dropped in their transform step. In many cases, that will just mean... not dropping them. But in some cases not dropping them will cause problems later on in the transform (e.g. the one you found in _aggregate_duplicate_boiler_fuel_keys()) and those problems will have to be chased down and addressed.

In the case of existing values that we're dropping, which already have a home table & metadata in the database schema, no changes to the fields or resource (table) definitions should be required -- it's just that additional information will be getting fed into the harvesting process to populate those existing fields.

@@ -263,6 +263,10 @@
"description": "Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier's system reaches its monthly peak. In megawatts.",
"unit": "MW",
},
"combined_heat_power": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a different field than associated_combined_heat_power or do they represent the same characteristic applied to generators vs. boilers?

@@ -15,6 +15,7 @@
"fuel_mmbtu_per_unit",
"sulfur_content_pct",
"ash_content_pct",
"combined_heat_power",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is truly a new field that we were inappropriately dropping (and losing entirely) before, and we want to have it independently associated with the boilers (in addition to the associated_combined_heat_power field that is applied to generators) then this seems appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the point is that #509 is specific to generators. I hadn't been thinking about generators vs. boilers specifically. I'll do the analogous thing just for generators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right to me. #509 (the issue) is very generic, even if it provides an example from the generators.

We're doing dropping potentially informational columns across a bunch of the EIA table transforms, and would prefer to feed them into the harvesting process so they can be used to more appropriately set the boiler / generator / utility / plant / balancing authority attributes (static or annual).

Addressing #509 should not have any effect on the database schema -- if there are columns in the wrong place, or columns we have failed to integrate entirely, those should probably be their own separate issues. This issue should just stop dropping columns prior to the harvesting process, so they can be used within it.

@zaneselvans zaneselvans added the harvest Normalization of poorly normalized inputs and reconciliation of internal inconsistencies label Feb 13, 2023
@zaneselvans
Copy link
Member

@knordback I'm going to close this draft PR since it's off a fork, and I think we mostly learned what the prototypical solution will look like in #2200.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
harvest Normalization of poorly normalized inputs and reconciliation of internal inconsistencies inframundo
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants