Bug 509 #2333

knordback · 2023-02-27T04:29:52Z

This is a proposed way of dealing with fields that are in the input data but we're no longer dropping in the eia923 transform() functions, but aren't in the corresponding SQLite tables. Basically, I'm just stripping out the columns after the harvesting step.

In ToT code, the fields are dropped so they never appear in the DataFrames. If we don't drop them, then leaving the corresponding columns in results in a crash when trying to populate the SQLite DB.

In this code, the relevant field I'm dealing with is utility_name_eia, which comes in as operator_name. This is in the generation_fuel_eia923 table, but gets duplicated in generation_fuel_nuclear_eia923. So it needs to be removed from both.

This approach feels very special-case-y. But maybe that's the nature of the beast. If the general approach seems okay and other currently-dropped fields need to be handled in the same way, then I'd probably at least pull the new code out into a separate function.

…el()

…neration_fuel()

For more information, see https://pre-commit.ci

codecov · 2023-02-27T05:01:58Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (62eee76) 88.4% compared to head (eaa7615) 88.4%.

Additional details and impacted files

@@          Coverage Diff          @@
##             dev   #2333   +/-   ##
=====================================
  Coverage   88.4%   88.4%           
=====================================
  Files         87      87           
  Lines      10139   10139           
=====================================
  Hits        8971    8971           
  Misses      1168    1168

Impacted Files	Coverage Δ
src/pudl/transform/eia923.py	`85.2% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

zaneselvans · 2023-02-27T14:42:36Z

src/pudl/transform/eia.py

+    eia_transformed_dfs["generation_fuel_eia923"] = eia_transformed_dfs[
+        "generation_fuel_eia923"
+    ].drop(columns=["utility_name_eia"])
+    eia_transformed_dfs["generation_fuel_nuclear_eia923"] = eia_transformed_dfs[
+        "generation_fuel_nuclear_eia923"
+    ].drop(columns=["utility_name_eia"])
+


I'm a little surprised that this is necessary. I thought that by default the harvesting process dropped any column that was harvested from the table it was found in (unless we specifically tell it not to rip the column out). But I could be wrong about this @cmgosnell would probably know better.

One option to deal with the issue of leftover columns that need to be harvested, but shouldn't stick around in a generic way is to use the pudl.metadata.classes.Resource.format_df() method, as we are in pudl.transform.classes.AbstractTableTransformer.enforce_schema().

In fact, copying enforce_schema() almost verbatim into a new function might be the way to go. I think the only big difference is you'd need to look up the appropriate resource using the table name that is the key in that dictionary of dataframes, rather than having self.table_id to work with.

Eventually the plan is to refactor the EIA transforms to use the same class design that we've applied to FERC 1, but for now just iterating over all the finished dataframes at the very end of the EIA transformation process would avoid the need to special case out which columns should be present or not, and do some other basic checks that will cause the DB to complain (no null PKs, unique PKs, etc).

…source, so it can be used from elsewhere

…into bug-509

For more information, see https://pre-commit.ci

…Frames before converting to SQLite

…into bug-509

For more information, see https://pre-commit.ci

…into bug-509

zaneselvans · 2023-03-01T05:36:30Z

src/pudl/transform/eia.py

+    for cat in eia_transformed_dfs:
+        resource = (
+            pudl.metadata.classes.Package.from_resource_ids().
+            get_resource(cat)
+        )
+        eia_transformed_dfs[cat] = resource.enforce_schema(
+            eia_transformed_dfs[cat])


Is there any reason not to wait to do this until the very end (after the balancing authority fix below)?

Is there any reason not to call enforce_schema() on the entity tables too?

Yes, I had already moved the code to the very end of the function.

And I added code to do the same thing for the entity tables. That doesn't seem to cause any distress.

Wow that's great! I'm slightly surprised. Do you want to try running tox -e nuke overnight and see if stays happy?

Seems not successful:
nuke: exit -9 (641.56 seconds) /home/knordback/Kurt/pudl> bash -c 'coverage run --append src/pudl/cli.py --logfile tox-nuke.log --clobber src/pudl/package_data/settings/etl_full.yml' pid=793582
.pkg: _exit> python /home/knordback/.conda/envs/pudl-dev/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta
nuke: FAIL code -9 (7080.40=setup[98.70]+cmd[0.12,0.54,0.48,0.39,0.41,0.42,0.39,0.53,0.24,6.33,6.21,1.24,0.03,1.72,1.23,161.77,33.89,25.29,3980.21,0.01,2118.71,641.56] seconds)
evaluation failed :( (7081.04 seconds)

All of those tests run successfully. So does tox with no arguments (which I thought was the commit criterion). So it seems to be something specific to the "nuke".

I am indeed getting copious output, but it's not (to my eye) indicating what the problem is. As mentioned above, I get some warnings but nothing obviously connected to this.

I'll be away next week. I'll will look into this more when I get back.

Hi Zane. I'm back, for a little while, and have some time to work on this again. I can continue un-dropping fields, but I'm not sure how worried to be about not being able to run tox -e nuke successfully. As noted, the other tests run fine for me. Output is attached in case you want to take a look.
tox-out.txt

Have you tried running the full ETL with the pudl_etl command line tool? The place that it's failing in your logs is the most memory intensive part of the ETL so my guess is it's running out of memory and crashing? I thought you'd been able to do the full ETL previously? How much memory + swap do you have? I think it may take up to 25GB as it is now.

tox with no arguments is what gets run in CI on GitHub. It only processes 1 year of data. For changes like the one you're making that will affect all years of data we try and run the full ETL locally before it gets merged into dev.

tox -e nuke is a blunt instrument that not only runs the full ETL, but also all the unit + integration tests on all the data + data validation tests. You probably want to be trying to run the full ETL, but only for the EIA data, to check whether everything is working. The devtools/eia-etl-debug.ipynb notebook is the easiest / most efficient way to do that right now, since you don't have to run the extract step over and over again. It'll also avoid this (FERC 1) related memory intensive step.

I have 16GB ram + 15GB swap. I thought I could run the full ETL but apparently not that either. I was able to run devtools/eia-etl-debug.ipynb (first time using Jupyter -- it's kinda slick).

Jupyter is great for working with data. Lots more human-friendly tools for seeing what's going on in there! Definitely worth getting familiar with.

review-notebook-app · 2023-03-20T17:10:16Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

zaneselvans

Other than removing the null df.drop() I think this looks great!

Do you want to go ahead and merge dev in and try to run the full ETL + data validations to see if anything is broken?

zaneselvans · 2023-05-24T19:12:59Z

src/pudl/transform/eia923.py

-                "net_generation_mwh_year_to_date",
-                "early_release",
-            ],
+            [],


We should just remove this .drop() altogether.

zaneselvans · 2023-05-25T01:17:00Z

If you've got the full DB already populated locally, you can run the full data tests and validations in parallel in two different windows with these commands from the main repo directory:

pytest --live-dbs test/validate
pytest --live-dbs --etl-settings src/pudl/package_data/settings/etl_full.yml test/integration

knordback · 2023-05-26T21:08:08Z

With the current code I get this:

===================================================================================================== short test summary info ======================================================================================================
FAILED test/validate/eia_test.py::test_minmax_rows[eia_raw-bf_eia923-1427692-1427692-119611] - ValueError: bf_eia923: found 1428112 rows, expected 1427692. Off by 0.029%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_raw-frc_eia923-597000-244415-24065] - ValueError: frc_eia923: found 608494 rows, expected 597000. Off by 1.925%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_raw-gf_eia923-2687345-2687345-230149] - ValueError: gf_eia923: found 2690175 rows, expected 2687345. Off by 0.105%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_raw-pu_eia860-184745-184745-184745] - ValueError: pu_eia860: found 184831 rows, expected 184745. Off by 0.047%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_raw-utils_eia860-119388-119388-119388] - ValueError: utils_eia860: found 121464 rows, expected 119388. Off by 1.739%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-bf_eia923-1427692-1427692-119611] - ValueError: bf_eia923: found 119646 rows, expected 119611. Off by 0.029%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-frc_eia923-597000-244415-24065] - ValueError: frc_eia923: found 24285 rows, expected 24065. Off by 0.914%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-gen_eia923-None-5171497-432570] - ValueError: gen_eia923: found 428497 rows, expected 432570. Off by -0.942%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-gf_eia923-2687345-2687345-230149] - ValueError: gf_eia923: found 230344 rows, expected 230149. Off by 0.085%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-pu_eia860-184745-184745-184745] - ValueError: pu_eia860: found 184831 rows, expected 184745. Off by 0.047%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_annual-utils_eia860-119388-119388-119388] - ValueError: utils_eia860: found 121464 rows, expected 119388. Off by 1.739%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_annual-hr_by_unit-362381-30340] - ValueError: hr_by_unit: found 30342 rows, expected 30340. Off by 0.007%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_annual-hr_by_gen-555119-46408] - ValueError: hr_by_gen: found 46410 rows, expected 46408. Off by 0.004%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_annual-fuel_cost-555119-46408] - ValueError: fuel_cost: found 46410 rows, expected 46408. Off by 0.004%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_annual-capacity_factor-5171497-432570] - ValueError: capacity_factor: found 428497 rows, expected 432570. Off by -0.942%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_annual-mcoe-5171881-432602] - ValueError: mcoe: found 428529 rows, expected 432602. Off by -0.942%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-bf_eia923-1427692-1427692-119611] - ValueError: bf_eia923: found 1428112 rows, expected 1427692. Off by 0.029%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-frc_eia923-597000-244415-24065] - ValueError: frc_eia923: found 246368 rows, expected 244415. Off by 0.799%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-gen_eia923-None-5171497-432570] - ValueError: gen_eia923: found 5122777 rows, expected 5171497. Off by -0.942%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-gf_eia923-2687345-2687345-230149] - ValueError: gf_eia923: found 2690175 rows, expected 2687345. Off by 0.105%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-pu_eia860-184745-184745-184745] - ValueError: pu_eia860: found 184831 rows, expected 184745. Off by 0.047%, allowed margin of 0.000%
FAILED test/validate/eia_test.py::test_minmax_rows[eia_monthly-utils_eia860-119388-119388-119388] - ValueError: utils_eia860: found 121464 rows, expected 119388. Off by 1.739%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_monthly-hr_by_unit-362381-30340] - ValueError: hr_by_unit: found 362405 rows, expected 362381. Off by 0.007%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_monthly-hr_by_gen-555119-46408] - ValueError: hr_by_gen: found 555143 rows, expected 555119. Off by 0.004%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_monthly-fuel_cost-555119-46408] - ValueError: fuel_cost: found 555143 rows, expected 555119. Off by 0.004%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_monthly-capacity_factor-5171497-432570] - ValueError: capacity_factor: found 5122777 rows, expected 5171497. Off by -0.942%, allowed margin of 0.000%
FAILED test/validate/mcoe_test.py::test_minmax_rows_mcoe[eia_monthly-mcoe-5171881-432602] - ValueError: mcoe: found 5123161 rows, expected 5171881. Off by -0.942%, allowed margin of 0.000%
================================================================================ 27 failed, 35 passed, 6 skipped, 47 warnings in 2606.97s (0:43:26) ================================================================================
minmax_rows: exit 1 (2625.05 seconds) /home/knordback/Kurt/pudl> pytest --color=yes --live-dbs test/validate/epacamd_eia_test.py::test_minmax_rows test/validate/ferc1_test.py::test_minmax_rows test/validate/eia_test.py::test_minmax_rows test/validate/mcoe_test.py::test_minmax_rows_mcoe pid=3047649

Does this make sense ? Is there an easy way to look at the diffs?

zaneselvans · 2023-05-29T15:23:40Z

Most of these changes are small increases in the number of rows which seems pretty reasonable. Changes that sounds out as maybe funny:

Nearly 2% increase in the number of records in the fuel receipts and costs table (frc_eia923)
Nearly 2% increase in the number of utility records (utils_eia860)
Loss of almost 1% of the records in the generation table (gen_eia923) and the tables that depend on it directly.

@cmgosnell do you have any thoughts on why these tables would have such big changes just because we're not dropping "extra" columns before harvesting? The utilities make some sense, but the FRC and generation seem surprising.

To diff the tables, there's sqldiff but it looks like that may only be a Windows utility. I think @rousik or @zschira were doing some DB table diffing, maybe with another utility?

I would probably read the two tables into pandas and set the index to the primary key and see what records exist in one table but not the other by taking the difference between the two indexes. But that won't work well on the FRC table since it has no natural primary key.

zaneselvans · 2023-06-02T00:54:57Z

@knordback it looks like @rousik has been doing a bunch of work diffing SQL tables and this might be another good test of his little toolkit too.

knordback · 2023-06-02T03:06:12Z

@knordback it looks like @rousik has been doing a bunch of work diffing SQL tables and this might be another good test of his little toolkit too.

Okay, that would be good. I get a lot of output just running sqldiff, and I don't know how to extract the essential difference.

rousik · 2023-06-02T14:04:04Z

Do you have the outputs on gcs, or can you tell me how to repro the etl run in question? I could test my scripts against this case.

…

On Fri, Jun 2, 2023, 05:06 knordback ***@***.***> wrote: @knordback <https://github.com/knordback> it looks like @rousik <https://github.com/rousik> has been doing a bunch of work diffing SQL tables and this might be another good test of his little toolkit too. Okay, that would be good. I get a lot of output just running sqldiff, and I don't know how to extract the essential difference. — Reply to this email directly, view it on GitHub <#2333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABYSCGGHM45YV6HFPBLDI5DXJFKC5ANCNFSM6AAAAAAVI5CXOE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

knordback · 2023-06-02T15:24:51Z

Do you have the outputs on gcs, or can you tell me how to repro the etl run in question? I could test my scripts against this case.
…
On Fri, Jun 2, 2023, 05:06 knordback @.> wrote: @knordback https://github.com/knordback it looks like @rousik https://github.com/rousik has been doing a bunch of work diffing SQL tables and this might be another good test of his little toolkit too. Okay, that would be good. I get a lot of output just running sqldiff, and I don't know how to extract the essential difference. — Reply to this email directly, view it on GitHub <#2333 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYSCGGHM45YV6HFPBLDI5DXJFKC5ANCNFSM6AAAAAAVI5CXOE . You are receiving this because you were mentioned.Message ID: @.>

I'm wanting to compare the pudl.sqlite produced on dev against that produced in the bug-509 branch. (I just merged from dev to the branch, so the diffs should only correspond to the branch changes.) I also have both locally and could put them on Google. Is there a particular place this kind of thing should go?

rousik · 2023-06-02T18:09:47Z

All right, I have ran bug-509 branch vs dev and here's the output diff report:

RowCount(pudl.sqlite/denorm_boiler_fuel_eia923): 108
RowCount(pudl.sqlite/denorm_boiler_fuel_monthly_eia923): 108
RowCount(pudl.sqlite/denorm_boiler_fuel_yearly_eia923): 9
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_eia923): 1390
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_monthly_eia923): 248
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_yearly_eia923): 32
RowCount(pudl.sqlite/denorm_generation_eia923): 24
RowCount(pudl.sqlite/denorm_generation_fuel_combined_eia923): 1300
RowCount(pudl.sqlite/denorm_generation_fuel_combined_monthly_eia923): 1300
RowCount(pudl.sqlite/denorm_generation_fuel_combined_yearly_eia923): 117
RowCount(pudl.sqlite/denorm_generation_monthly_eia923): 24
RowCount(pudl.sqlite/denorm_generation_yearly_eia923): 2
RowCount(pudl.sqlite/denorm_plants_utilities_eia): 65
RowCount(pudl.sqlite/denorm_utilities_eia): 7
RowCount(pudl.sqlite/utilities_eia860): 7
RowCount(pudl.sqlite/utilities_entity_eia): 1

The positive number means that that many rows were added in the branch bug-509 (right side of the comparison).

The relevant piece of code you can use to fetch the differing rows would be something along the lines of:

def read_table_as_df(db_path, table_name):
  con = create_engine(db_path)
  return pd.concat([
    df for df in pd.read_sql_table(table_name, con, chunksize=100_000
  ])

df_left = read_table_as_df(left_db, table) 
df_right = read_table_as_df(right_db, table)

df_merge = df_left.merge(df_right, how="outer", indicator=True)
# Then, df_merge["_merge"] has values "left_only", "right_only" or "both" depending on where it occurs

knordback · 2023-06-02T18:16:50Z

Ah, this is super helpful. More rows in bug-509 is what we would expect. Which version of the ETL did you run to get this?

rousik · 2023-06-05T08:36:52Z

Latest dev available at the time and latest commit of bug-509. rousik-output-diff branch has a tool "output_diff" you could point at two directories and it will produce the report for you. I'll be working on automating this.

zaneselvans · 2023-06-05T15:57:32Z

Huh, those differences in the number of rows seem pretty different from what @knordback got when he ran the minmax_rows tests. I wonder why?

knordback · 2023-06-05T16:07:35Z

It depends on the ETL version used to create the pudl.sqlite, right? So I'm running modified and reference versions of etl_full_no_cems and will try @rousik 's tool on the output.

zaneselvans · 2023-06-10T02:19:20Z

If you've got two different DBs side by side and you want to see differences between the outputs they generate you can do something like the following in a Jupyter Notebook (here looking at the plant_in_service_ferc1 output table, which should have some differences between your DB and the last successful nightly build output from dev:

import pandas as pd
import sqlalchemy as sa

from pudl.metadata.classes import Resource
from pudl.output.pudltabl import PudlTabl

pk_cols = Resource.from_id("plant_in_service_ferc1").schema.primary_key

# Use the real paths to your 2 DBs obvs:
left_db_url = "sqlite:///" + "/Users/zane/code/catalyst/pudl-work/output/pudl.sqlite"
left_engine = sa.create_engine(left_db_url)
left_pudl_out = PudlTabl(left_engine)
left_df = pudl_out.plant_in_service_ferc1().set_index(pk_cols)

# Use the real paths to your 2 DBs obvs:
right_db_url = "sqlite:///" + "/Users/zane/code/catalyst/pudl-work/output/pudl.sqlite"
right_engine = sa.create_engine(right_db_url)
right_pudl_out = PudlTabl(right_engine)
right_df = pudl_out.plant_in_service_ferc1().set_index(pk_cols)

left_only_index = left_df.index.difference(right_df.index)
right_only_index = right_df.index.difference(left_df.index)

left_only_df = left_df.loc[left_only_index]
right_only_df = right_df.loc[right_only_index]

knordback · 2023-06-26T18:47:40Z

Remarkably, I finally got back to this. I ran the bug-509 code and compared output against dev. To compare, I first ran Jan's output_diff tool to find pudl.sqlite tables that differ. It produced the following:

2023-06-25 17:37:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/advanced_metering_infrastructure_eia861.
2023-06-25 17:37:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/alembic_version.
2023-06-25 17:37:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/averaging_periods_eia.
2023-06-25 17:37:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/balance_sheet_assets_ferc1.
2023-06-25 17:37:06 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/balance_sheet_liabilities_ferc1.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/balancing_authorities_eia.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/balancing_authority_assn_eia861.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/balancing_authority_eia861.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_cooling_assn_eia860.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_emissions_control_equipment_assn_eia860.
2023-06-25 17:37:07 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_fuel_eia923.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_generator_assn_eia860.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_generator_assn_types_eia.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_stack_flue_assn_eia860.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_status_eia.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boiler_types_eia.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boilers_eia860.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/boilers_entity_eia.
2023-06-25 17:37:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/cash_flow_ferc1.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/coalmine_eia923.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/coalmine_types_eia.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/contract_types_eia.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/data_maturities.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/datasources.
2023-06-25 17:37:12 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_hourly_pa_ferc714.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_response_eia861.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_response_water_heater_eia861.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_side_management_ee_dr_eia861.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_side_management_misc_eia861.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/demand_side_management_sales_eia861.
2023-06-25 17:37:49 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_balance_sheet_assets_ferc1.
2023-06-25 17:37:50 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_balance_sheet_liabilities_ferc1.
2023-06-25 17:37:51 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_boiler_fuel_eia923.
RowCount(pudl.sqlite/denorm_boiler_fuel_eia923): 420
2023-06-25 17:37:54 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_boiler_fuel_monthly_eia923.
RowCount(pudl.sqlite/denorm_boiler_fuel_monthly_eia923): 420
2023-06-25 17:37:57 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_boiler_fuel_yearly_eia923.
RowCount(pudl.sqlite/denorm_boiler_fuel_yearly_eia923): 35
2023-06-25 17:37:58 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_boilers_eia.
2023-06-25 17:37:58 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_cash_flow_ferc1.
2023-06-25 17:37:59 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_depreciation_amortization_summary_ferc1.
2023-06-25 17:38:00 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_energy_dispositions_ferc1.
2023-06-25 17:38:00 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_energy_sources_ferc1.
2023-06-25 17:38:00 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_operating_expenses_ferc1.
2023-06-25 17:38:02 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_operating_revenues_ferc1.
2023-06-25 17:38:03 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_plant_depreciation_changes_ferc1.
2023-06-25 17:38:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electric_plant_depreciation_functional_ferc1.
2023-06-25 17:38:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_electricity_sales_by_rate_schedule_ferc1.
2023-06-25 17:38:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_emissions_control_equipment_eia860.
2023-06-25 17:38:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_fuel_by_plant_ferc1.
2023-06-25 17:38:04 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_fuel_ferc1.
2023-06-25 17:38:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_fuel_receipts_costs_eia923.
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_eia923): 11494
2023-06-25 17:38:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_fuel_receipts_costs_monthly_eia923.
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_monthly_eia923): 1953
2023-06-25 17:38:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_fuel_receipts_costs_yearly_eia923.
RowCount(pudl.sqlite/denorm_fuel_receipts_costs_yearly_eia923): 220
2023-06-25 17:38:05 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_eia923.
RowCount(pudl.sqlite/denorm_generation_eia923): 264
2023-06-25 17:38:06 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_fuel_combined_eia923.
RowCount(pudl.sqlite/denorm_generation_fuel_combined_eia923): 2794
2023-06-25 17:38:10 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_fuel_combined_monthly_eia923.
RowCount(pudl.sqlite/denorm_generation_fuel_combined_monthly_eia923): 2794
2023-06-25 17:38:13 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_fuel_combined_yearly_eia923.
RowCount(pudl.sqlite/denorm_generation_fuel_combined_yearly_eia923): 192
2023-06-25 17:38:13 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_monthly_eia923.
RowCount(pudl.sqlite/denorm_generation_monthly_eia923): 264
2023-06-25 17:38:15 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generation_yearly_eia923.
RowCount(pudl.sqlite/denorm_generation_yearly_eia923): 22
2023-06-25 17:38:15 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_generators_eia.
2023-06-25 17:38:15 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_income_statement_ferc1.
2023-06-25 17:38:17 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_other_regulatory_liabilities_ferc1.
2023-06-25 17:38:17 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_ownership_eia860.
2023-06-25 17:38:17 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plant_in_service_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_all_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_eia.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_hydro_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_pumped_storage_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_small_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_steam_ferc1.
2023-06-25 17:38:19 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_utilities_eia.
RowCount(pudl.sqlite/denorm_plants_utilities_eia): 83
2023-06-25 17:38:20 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_plants_utilities_ferc1.
2023-06-25 17:38:20 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_purchased_power_ferc1.
2023-06-25 17:38:20 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_retained_earnings_ferc1.
2023-06-25 17:38:20 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_transmission_statistics_ferc1.
2023-06-25 17:38:21 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_utilities_eia.
RowCount(pudl.sqlite/denorm_utilities_eia): 2076
2023-06-25 17:38:21 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/denorm_utility_plant_summary_ferc1.
2023-06-25 17:38:22 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/depreciation_amortization_summary_ferc1.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/distributed_generation_fuel_eia861.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/distributed_generation_misc_eia861.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/distributed_generation_tech_eia861.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/distribution_systems_eia861.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/dynamic_pricing_eia861.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_energy_dispositions_ferc1.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_energy_sources_ferc1.
2023-06-25 17:38:23 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_operating_expenses_ferc1.
2023-06-25 17:38:25 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_operating_revenues_ferc1.
2023-06-25 17:38:25 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_plant_depreciation_changes_ferc1.
2023-06-25 17:38:26 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electric_plant_depreciation_functional_ferc1.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/electricity_sales_by_rate_schedule_ferc1.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/emissions_control_equipment_eia860.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/energy_efficiency_eia861.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/energy_sources_eia.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/environmental_equipment_manufacturers_eia.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/epacamd_eia.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/epacamd_eia_subplant_ids.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/ferc_accounts.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/firing_types_eia.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/fuel_ferc1.
2023-06-25 17:38:27 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/fuel_receipts_costs_aggs_eia.
2023-06-25 17:38:28 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/fuel_receipts_costs_eia923.
2023-06-25 17:38:28 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/fuel_transportation_modes_eia.
2023-06-25 17:38:28 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/fuel_types_aer_eia.
2023-06-25 17:38:28 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/generation_eia923.
2023-06-25 17:38:29 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/generation_fuel_eia923.
2023-06-25 17:38:35 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/generation_fuel_nuclear_eia923.
2023-06-25 17:38:35 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/generators_eia860.
2023-06-25 17:38:36 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/generators_entity_eia.
2023-06-25 17:38:36 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/green_pricing_eia861.
2023-06-25 17:38:36 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/income_statement_ferc1.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/mercury_compliance_strategies_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/mergers_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/momentary_interruptions_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/net_metering_customer_fuel_class_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/net_metering_misc_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/non_net_metering_customer_fuel_class_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/non_net_metering_misc_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/nox_compliance_strategies_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/nox_control_status_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/nox_units_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/operational_data_misc_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/operational_data_revenue_eia861.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/operational_status_eia.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/other_regulatory_liabilities_ferc1.
2023-06-25 17:38:38 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/ownership_eia860.
2023-06-25 17:38:39 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/particulate_compliance_strategies_eia.
2023-06-25 17:38:39 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/particulate_units_eia.
2023-06-25 17:38:39 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plant_in_service_ferc1.
2023-06-25 17:38:40 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_eia.
2023-06-25 17:38:40 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_eia860.
2023-06-25 17:38:40 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_entity_eia.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_hydro_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_pudl.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_pumped_storage_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_small_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/plants_steam_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/political_subdivisions.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/power_purchase_types_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/prime_movers_eia.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/purchased_power_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/regulations_eia.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/reliability_eia861.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/reporting_frequencies_eia.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/respondent_id_ferc714.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/retained_earnings_ferc1.
2023-06-25 17:38:41 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/sales_eia861.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/sector_consolidated_eia.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/service_territory_eia861.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/so2_compliance_strategies_eia.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/so2_units_eia.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/steam_plant_types_eia.
2023-06-25 17:38:43 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/transmission_statistics_ferc1.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_eia.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_eia860.
RowCount(pudl.sqlite/utilities_eia860): 2076
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_entity_eia.
RowCount(pudl.sqlite/utilities_entity_eia): 1
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_ferc1.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_ferc1_dbf.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_ferc1_xbrl.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utilities_pudl.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_assn_eia861.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_data_misc_eia861.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_data_nerc_eia861.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_data_rto_eia861.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_plant_assn.
2023-06-25 17:38:44 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/utility_plant_summary_ferc1.
2023-06-25 17:38:45 [    INFO] catalystcoop.pudl.output_diff.cli:331 Analyzing rows of pudl.sqlite/wet_dry_bottom_eia.

(It also found differences in ferc2.sqlite, but I think that those occur because resource constraints on my machine meant that not everything could run.)

Anyhow, in the output above I chose table differences that I felt pretty confident I could identify in PudlTabl. I started with denorm_plants_utlities_eia, which appears in PudlTabl and apparently is accessed via pu_eia860(), and put them into the Jupyter code above. I then get the output below. First, the right-versus-left difference in rows is 83, which is the same thing output_diff gave, so that much is a good sign. But it means there are rows that appear in the bug-509 output that don't appear in dev. That is not what I would expect. Second, some of the rows have changed. E.g., the first row of left_only_df corresponds to the second row of right_only_df, with the same plant IDs and plant name. But the utility ID changes. Since utility_id_eia is one of the fields that's no longer dropped, it makes sense that this would change. But I would have expected it to take on some null value in dev code, which doesn't seem to be the case. Moreover, the utlity_name_eia changes (from American Electric Power Co Inc to Public Service Co of Oklahoma). A quick search indicates that the former is correct: https://en.wikipedia.org/wiki/Oklaunion_Power_Plant

So this is all worrisome and it appears maybe something is going wrong. It would be good to get a quick check of this to see if maybe this is better than it appears to me, of if I need to investigate more deeply.

zaneselvans · 2023-06-28T18:41:17Z

I'm not sure this is necessarily concerning. You're comparing the left_only and right_only dataframes which must, by definition, not match in some way. I don't think that the ordering is necessarily deterministic, unless you're sorting by index / value at some point.

I think it makes sense to start with one of the "base" tables rather than derived tables that are several steps downstream, since the tables that are the immediate outputs from the harvesting are going to be the first places that the consequences of the new process appear, and everything downstream will be a result of those changes.

The outputs of the harvesting process are the entity and annual tables for plants, generators, boilers, and utilities:

plants_entity_eia
plants_eia860
generators_entity_eia
generators_eia860
boilers_entity_eia
boilers_eia860
utilities_entity_eia
utilities_eia860

You might also want to pull the data directly from the database(s) rather than using PudlTabl just to make sure that there's nothing in that software layer that's causing a discrepancy.

knordback · 2023-07-11T17:42:46Z

Here's a selection of the differences I'm seeing. This is not comprehensive, as some of the tables have many differences and I haven't looked at all of them.

plants_entity_eia:
- mostly changes in plants_entity_eia: many from <NA> to something that looks valid; others seem like just different versions of the name
plants_eia860:
- utility_id_eia value 15143 replaced with <NA>
- nerc_region value SERC replaced with WECC
- in general, lots and lots of changes to nerc_region, some of which seem like improvements, some which seem like regressions
generators_entity_eia:
- no differences
generators_eia860:
- seems to all be changes in utility_id_eia value from <NA> to something credible-looking
boilers_entity_eia:
- no differences
boilers_eia860:
- no differences
utilities_entity_eia:
- one additional entry in the table, that looks generally credible
- also lots of changes to utility_name_eia
utilities_eia860:
- 2076 additional entries, all with what look like reasonable utility_id_eia and report_date values, and all other entries <NA>

The following files show the differences. (I should have named them better: *-left.txt come from dev; *-right.txt come from bug-509 branch.) These are generated with code that loads the created pudl.sqlite files, extracts the referenced tables, converts to two DataFrames, and then differences series-by-series and outputs as text the differences. The diffs are viewable by comparing foo-left.txt and foo-right.txt using a visual differencer (tkdiff, meld, etc.)
utilities_eia860-right.txt
utilities_eia860-left.txt
utilities_entity_eia-right.txt
utilities_entity_eia-left.txt
generators_eia860-right.txt
generators_eia860-left.txt
plants_entity_eia-right.txt
plants_entity_eia-left.txt
plants_eia860-right.txt
plants_eia860-left.txt

knordback and others added 5 commits February 13, 2023 18:18

Remove plant_name_eia in pudl.transform.eia923.fuel_receipts_costs()

c48584e

Don't discard plant_name_eia from pudl.transform.eia923.generation_fu…

13a0dc1

…el()

Don't drop operator_name/utility_name_eia in pudl.transform.eia923.ge…

b4a3c28

…neration_fuel()

Move newly added code to what seems like a more sensible location

35b6c60

[pre-commit.ci] auto fixes from pre-commit.com hooks

d6021ac

For more information, see https://pre-commit.ci

zaneselvans reviewed Feb 27, 2023

View reviewed changes

zaneselvans added data-cleaning Tasks related to cleaning & regularizing data during ETL. harvest Normalization of poorly normalized inputs and reconciliation of internal inconsistencies labels Feb 27, 2023

knordback and others added 3 commits February 27, 2023 18:25

Extract the core of AbstractTableTransformer.enforce_schema() into Re…

adc4503

…source, so it can be used from elsewhere

Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pudl …

2bcd803

…into bug-509

[pre-commit.ci] auto fixes from pre-commit.com hooks

c44949b

For more information, see https://pre-commit.ci

zaneselvans changed the base branch from main to dev February 28, 2023 15:58

jdangerx added the inframundo label Feb 28, 2023

knordback and others added 5 commits February 28, 2023 17:00

Use Resource.endorse_schema() to clean up non-matching fields in Data…

731baef

…Frames before converting to SQLite

Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pudl …

eef6cbf

…into bug-509

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbe7b21

For more information, see https://pre-commit.ci

Move DataFrame cleaning to the end of transform()

e2768f9

Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pudl …

46d7e66

…into bug-509

zaneselvans reviewed Mar 1, 2023

View reviewed changes

knordback and others added 3 commits March 2, 2023 14:11

Call enforce_schema() on entities DataFrames as well

508ba29

Merge branch 'dev' into bug-509

126102e

Merge branch 'dev' into bug-509

e489ed4

zaneselvans added eia923 Anything having to do with EIA Form 923 eia860 Anything having to do with EIA Form 860 labels Mar 16, 2023

knordback added 3 commits March 17, 2023 16:06

Don't drop combined_heat_power in boiler_fuel()

dc5068e

One more incremental change in boiler_fuel()

fdada61

Several more fields not dropped

40ef5f4

Merge in substantial changes from dev.

dd9982a

zaneselvans reviewed May 24, 2023

View reviewed changes

knordback added 2 commits May 24, 2023 16:55

Remove a useless function call

15cb0c6

Merge branch 'dev' into bug-509

97ff719

zaneselvans marked this pull request as ready for review May 25, 2023 01:14

Merge branch 'dev' into bug-509

a4f6234

zaneselvans added this to the 2023 Spring milestone Jun 5, 2023

zaneselvans assigned knordback Jun 5, 2023

knordback and others added 2 commits June 8, 2023 16:32

Merge branch 'dev' into bug-509

968267c

Merge branch 'dev' into bug-509

a6f7f9b

Merge branch 'dev' into bug-509

5d3a0e8

Merge branch 'dev' into bug-509

eaa7615

bendnorman added the requires-debug Things that have been worked on but hit an issue that requires debugging. label Jul 24, 2023

jdangerx added the community label Nov 11, 2023

Base automatically changed from dev to main January 5, 2024 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 509 #2333

Bug 509 #2333

knordback commented Feb 27, 2023

codecov bot commented Feb 27, 2023 •

edited

Loading

zaneselvans Feb 27, 2023

zaneselvans Mar 1, 2023

knordback Mar 2, 2023

zaneselvans Mar 2, 2023

knordback Mar 2, 2023

knordback Mar 3, 2023

knordback Mar 4, 2023

knordback Mar 16, 2023

zaneselvans Mar 16, 2023

knordback Mar 17, 2023

zaneselvans Mar 17, 2023

review-notebook-app bot commented Mar 20, 2023

zaneselvans left a comment

zaneselvans May 24, 2023

zaneselvans commented May 25, 2023

knordback commented May 26, 2023 •

edited by zaneselvans

Loading

zaneselvans commented May 29, 2023

zaneselvans commented Jun 2, 2023

knordback commented Jun 2, 2023

rousik commented Jun 2, 2023 via email

knordback commented Jun 2, 2023

rousik commented Jun 2, 2023

knordback commented Jun 2, 2023

rousik commented Jun 5, 2023

zaneselvans commented Jun 5, 2023

knordback commented Jun 5, 2023

zaneselvans commented Jun 10, 2023

knordback commented Jun 26, 2023 •

edited by zaneselvans

Loading

zaneselvans commented Jun 28, 2023

knordback commented Jul 11, 2023

Bug 509 #2333

Are you sure you want to change the base?

Bug 509 #2333

Conversation

knordback commented Feb 27, 2023

codecov bot commented Feb 27, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

review-notebook-app bot commented Mar 20, 2023

zaneselvans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaneselvans commented May 25, 2023

knordback commented May 26, 2023 • edited by zaneselvans Loading

zaneselvans commented May 29, 2023

zaneselvans commented Jun 2, 2023

knordback commented Jun 2, 2023

rousik commented Jun 2, 2023 via email

knordback commented Jun 2, 2023

rousik commented Jun 2, 2023

knordback commented Jun 2, 2023

rousik commented Jun 5, 2023

zaneselvans commented Jun 5, 2023

knordback commented Jun 5, 2023

zaneselvans commented Jun 10, 2023

knordback commented Jun 26, 2023 • edited by zaneselvans Loading

zaneselvans commented Jun 28, 2023

knordback commented Jul 11, 2023

codecov bot commented Feb 27, 2023 •

edited

Loading

knordback commented May 26, 2023 •

edited by zaneselvans

Loading

knordback commented Jun 26, 2023 •

edited by zaneselvans

Loading