Extract 714 xbrl #3822

aesharpe · 2024-08-30T00:28:13Z

Overview

closes #3813
closes #3814
closes #3815

What problem does this address?

We don't currently extract XBRL tables for the FERC 714 data. This code adds the capability to do so.

What did you change?

pudl/extract/ferc714.py

Add _csv suffix to csv-specific extraction functions/variables.
Rename csv-specific encoding dictionary to FERC714_CSV_ENCODING
Add TABLE_NAME_MAP_FERC714 dictionary
Add a function called create_raw_ferc714_xbrl_assets to reference the raw xbrl tables in the FERC 714 SQLite db and call it.
Add a function called raw_ferc714_xbrl__metadata_json to make an asset for the metadata table

pudl/io_managers.py

Upgrade the FercXBRLSQLiteIOManager to accommodate more than just ferc1
Add a ferc714_xbrl_sqlite_io_manager function

pudl/settings.py

Add xbrl_years functino to the Ferc714Settings class to be able to access XBRL years.
Remove comments about not processing the XBRL data.

pudl/metadata/sources.py

Update the FERC714 working partitions.
Remove comments about not processing XBRL data.

Testing

How did you make sure this worked? How can a reviewer verify this?

Load each of the raw assets via and make sure they contain the expected years:

defs.load_asset_value("raw_ferc714_xbrl__table_name")

To-do list

Give feedback

Update the release notes: reference the PR and related issues.
Run make pytest-coverage locally to ensure that the merge queue will accept your PR.
Review the PR yourself and call out any questions or issues you have
For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.
Alternatively, run the build-deploy-pudl GitHub Action manually.
Options

…n functions to FERC 714 extract module and include ferc714 io_manager.

src/pudl/extract/ferc714.py

src/pudl/io_managers.py

…ix functions mimic those of FERC 1

…nager

…4Settings

src/pudl/settings.py

…nction to fix unit tests

aesharpe · 2024-09-04T23:11:24Z

Unit test failures have something to do with this snipit I changed in io_managers.py:

ferc_settings = context.resources.dataset_settings.get_datasets()[
    self.db_name.replace("_xbrl", "")
]

The pytest test/unit/io_managers_test.py::test_ferc_xbrl_sqlite_io_manager_dedupes function does not like this.

UPDATE:

I fixed this by changing this snipit to:

ferc_settings = getattr(
    context.resources.dataset_settings, self.db_name.replace("_xbrl", "")
)

src/pudl/extract/ferc714.py

cmgosnell

two requests for changes in comments, plus the settings yml files need to updated! but those seem like minor tweaks overall this is looking great imo and I was able to materialize these xbrl raw assets locally

test/unit/io_managers_test.py

src/pudl/settings.py

cmgosnell

two blocking needs here:

to change
years = ", ".join(map(str, ferc714_settings.years))
to
years = ", ".join(map(str, ferc714_settings.csv_years))

in _extract_raw_ferc714_csv

and again to add the new years in the settings yml files.

aesharpe · 2024-09-05T20:04:41Z

and again to add the new years in the settings yml files.

The reason I didn't do this is because I think it will break the transforms and I didn't want to merge it into main that way.

cmgosnell · 2024-09-05T23:44:55Z

The reason I didn't do this is because I think it will break the transforms and I didn't want to merge it into main that way.

hm this is a good point... but I think in this context because the only existing transforms are based on the csv extracted tables they will be fine. but still i hadn't put it in context that this was only the extraction so i will renege on this suggestion.

* Add _csv suffix to current FERC714 raw assets and functions * Add create_raw_ferc714_xbrl_assets and raw_ferc714_xbrl__metadata_json functions to FERC 714 extract module and include ferc714 io_manager. * Update XBRL table dict to mimic the TABLE_NAME_MAP_FERC1 format and fix functions mimic those of FERC 1 * Add .get_datasets() to ferc_settings definition in FercXBRLSQLiteIOManager * Update FERC714 working paritions and add xbrl_year function to Ferc714Settings * Fix < to > in xbrl_years * Change test output names to ferc1_xbrl from test_db * Fix ferc_settings definition in FercXBRLSQLiteIOManager load_input function to fix unit tests * Remove old FERC714_XBRL_FILES dict * Add and update dictionary descriptions * Update release notes * Fix ferc io_managers and test * Add csv_years property to Ferc714Settings * add csv_years reference to raw_ferc714_csv_asset_factory function

zaneselvans · 2024-09-06T01:05:04Z

I think the integration test failures are probably because the tests don't know they need to run the FERC-714 XBRL to SQLite conversion before the main PUDL ETL can happen. You can follow the pattern that's used for the FERC Form 1 XBRL to SQLite conversion in test/conftest.py. These are the analogous fixtures:

@pytest.fixture(scope="session", name="ferc1_engine_xbrl")
def ferc1_xbrl_sql_engine(ferc1_xbrl_extract, dataset_settings_config) -> sa.Engine:
    """Grab a connection to the FERC Form 1 DB clone."""
    context = build_init_resource_context(
        resources={"dataset_settings": dataset_settings_config}
    )
    return ferc1_xbrl_sqlite_io_manager(context).engine


@pytest.fixture(scope="session", name="ferc1_xbrl_taxonomy_metadata")
def ferc1_xbrl_taxonomy_metadata(ferc1_engine_xbrl: sa.Engine):
    """Read the FERC 1 XBRL taxonomy metadata from JSON."""
    result = materialize_to_memory([raw_ferc1_xbrl__metadata_json])
    assert result.success

    return result.output_for_node("raw_ferc1_xbrl__metadata_json")

And then the PUDL IO Manager has a declared dependency on the FERC1 XBRL engine:

@pytest.fixture(scope="session")
def pudl_io_manager(
    ferc1_engine_dbf: sa.Engine,  # Implicit dependency
    ferc1_engine_xbrl: sa.Engine,  # Implicit dependency
    live_dbs: bool,
    pudl_datastore_config,
    dataset_settings_config,
    request,
) -> PudlMixedFormatIOManager:

aesharpe · 2024-09-06T17:13:39Z

@zaneselvans thank you for that suggestion! I just had to add one more fixture for ferc714_xbrl_extract and it was good to go.

aesharpe · 2024-09-06T17:29:26Z

mmm getting a unit test failure seemingly related to timeseries. Is timeseries cleaning part of the extraction or transformation phase @zaneselvans?

        for method in "tubal", "tnn":
            # Impute null values
            imputed0 = s.impute(mask=mask, method=method, rho0=1, maxiter=1)
            imputed = s.impute(mask=mask, method=method, rho0=1, maxiter=10)
            # Deviations between original and imputed values
            fit0 = s.summarize_imputed(imputed0, mask)
            fit = s.summarize_imputed(imputed, mask)
            # Mean MAPE (mean absolute percent error) is converging
>           assert fit["mape"].mean() < fit0["mape"].mean()
E           assert 0.030191300830775987 < 0.024084802339886784
E            +  where 0.030191300830775987 = mean()
E            +    where mean = 0    0.002769\n1    0.001304\n2    0.011850\n3    0.002106\n4    0.008777\n5    0.007833\n6    0.135690\n7    0.001992\n8    0.118109\n9    0.011482\nName: mape, dtype: float64.mean
E            +  and   0.024084802339886784 = mean()
E            +    where mean = 0    0.010499\n1    0.011460\n2    0.010525\n3    0.005897\n4    0.007852\n5    0.015753\n6    0.054820\n7    0.022598\n8    0.045421\n9    0.056023\nName: mape, dtype: float64.mean

test/unit/analysis/timeseries_cleaning_test.py:109: AssertionError

FAILED test/unit/analysis/timeseries_cleaning_test.py::test_flags_and_imputes_anomalies[5248964137-8991153078] - assert 0.030191300830775987 < 0.024084802339886784
 +  where 0.030191300830775987 = mean()
 +    where mean = 0    0.002769\n1    0.001304\n2    0.011850\n3    0.002106\n4    0.008777\n5    0.007833\n6    0.135690\n7    0.001992\n8    0.118109\n9    0.011482\nName: mape, dtype: float64.mean
 +  and   0.024084802339886784 = mean()
 +    where mean = 0    0.010499\n1    0.011460\n2    0.010525\n3    0.005897\n4    0.007852\n5    0.015753\n6    0.054820\n7    0.022598\n8    0.045421\n9    0.056023\nName: mape, dtype: float64.mean
===== 1 failed, 1645 passed, 1 skipped, 9 xfailed, 328 warnings in 40.74s ======
make: *** [Makefile:118: pytest-unit] Error 1

zaneselvans · 2024-09-06T20:57:03Z

Oh, the timeseries issue is probably a rare stochastic failure -- it happens once in a blue moon, probably having something to do with the seed being used for the random number generator. If you re-run the unit tests I suspect they'll pass just fine.

aesharpe · 2024-09-06T20:58:42Z

Oh, the timeseries issue is probably a rare stochastic failure -- it happens once in a blue moon, probably having something to do with the seed being used for the random number generator. If you re-run the unit tests I suspect they'll pass just fine.

That is wild and also strangely comforting. I'm glad I didn't try and preemptively figure this out...haha

zaneselvans · 2024-09-06T21:00:21Z

I spent a bunch of time at some point hunting down all the RNG initializations and tried to give them fixed seeds in the tests so we always get the same outputs, but I must have missed one somewhere, and it bites us every so often. It just didn't seem like it was worth continuing the hunt because it's so infrequent.

zaneselvans · 2024-09-06T21:04:02Z

Especially in the unit tests, whenever there's something like this that seems totally out of left field and unrelated to what's being changed my first instinct is just to re-run the tests to see if it's real. Doesn't take much time or effort, and often it works.

aesharpe added 2 commits August 28, 2024 16:02

Add _csv suffix to current FERC714 raw assets and functions

a16f5b9

Add create_raw_ferc714_xbrl_assets and raw_ferc714_xbrl__metadata_jso…

20441f2

…n functions to FERC 714 extract module and include ferc714 io_manager.

aesharpe commented Aug 30, 2024

View reviewed changes

src/pudl/extract/ferc714.py Outdated Show resolved Hide resolved

aesharpe commented Aug 30, 2024

View reviewed changes

src/pudl/extract/ferc714.py Outdated Show resolved Hide resolved

aesharpe commented Aug 30, 2024

View reviewed changes

src/pudl/io_managers.py Outdated Show resolved Hide resolved

aesharpe requested a review from cmgosnell August 30, 2024 00:44

aesharpe added the ferc714 Anything having to do with FERC Form 714 label Sep 3, 2024

aesharpe self-assigned this Sep 3, 2024

cmgosnell reviewed Sep 3, 2024

View reviewed changes

src/pudl/io_managers.py Outdated Show resolved Hide resolved

aesharpe added 4 commits September 3, 2024 17:40

Merge branch 'main' into extract-714-xbrl

72d89e8

Update XBRL table dict to mimic the TABLE_NAME_MAP_FERC1 format and f…

3dd9732

…ix functions mimic those of FERC 1

Add .get_datasets() to ferc_settings definition in FercXBRLSQLiteIOMa…

72744a8

…nager

Update FERC714 working paritions and add xbrl_year function to Ferc71…

375f952

…4Settings

zaneselvans reviewed Sep 4, 2024

View reviewed changes

src/pudl/settings.py Outdated Show resolved Hide resolved

aesharpe added 3 commits September 4, 2024 17:05

Fix < to > in xbrl_years

1b21b01

Change test output names to ferc1_xbrl from test_db

13e7252

Fix ferc_settings definition in FercXBRLSQLiteIOManager load_input fu…

b5ab5ad

…nction to fix unit tests

aesharpe added 4 commits September 4, 2024 17:12

Merge branch 'main' into extract-714-xbrl

38dcf91

Remove old FERC714_XBRL_FILES dict

665a448

Add and update dictionary descriptions

7198d7c

Update release notes

aa94b9c

aesharpe marked this pull request as ready for review September 5, 2024 05:17

aesharpe added data-update When fresh data is integrated into PUDL from quarterly or annual updates xbrl Related to the FERC XBRL transition labels Sep 5, 2024

aesharpe commented Sep 5, 2024

View reviewed changes

src/pudl/extract/ferc714.py Show resolved Hide resolved

aesharpe commented Sep 5, 2024

View reviewed changes

src/pudl/extract/ferc714.py Show resolved Hide resolved

cmgosnell requested changes Sep 5, 2024

View reviewed changes

test/unit/io_managers_test.py Outdated Show resolved Hide resolved

src/pudl/settings.py Show resolved Hide resolved

aesharpe added 2 commits September 5, 2024 13:06

Merge branch 'main' into extract-714-xbrl

6cdade1

Fix ferc io_managers and test

d9009d4

Add csv_years property to Ferc714Settings

e45d8a9

cmgosnell requested changes Sep 5, 2024

View reviewed changes

add csv_years reference to raw_ferc714_csv_asset_factory function

75f6600

cmgosnell approved these changes Sep 5, 2024

View reviewed changes

cmgosnell added this pull request to the merge queue Sep 5, 2024

Merge branch 'main' into extract-714-xbrl

8627078

aesharpe removed this pull request from the merge queue due to a manual request Sep 5, 2024

aesharpe added 2 commits September 6, 2024 11:05

Add fixtures for FERC 714 XBRL tests

578c071

Merge branch 'main' into extract-714-xbrl

099e02f

aesharpe enabled auto-merge September 6, 2024 17:13

aesharpe added this pull request to the merge queue Sep 6, 2024

Merged via the queue into main with commit 3435de7 Sep 6, 2024
17 checks passed

aesharpe deleted the extract-714-xbrl branch September 6, 2024 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract 714 xbrl #3822

Extract 714 xbrl #3822

aesharpe commented Aug 30, 2024 •

edited

Loading

To-do list

aesharpe commented Sep 4, 2024 •

edited

Loading

cmgosnell left a comment

cmgosnell left a comment

aesharpe commented Sep 5, 2024

cmgosnell commented Sep 5, 2024

zaneselvans commented Sep 6, 2024

aesharpe commented Sep 6, 2024

aesharpe commented Sep 6, 2024 •

edited

Loading

zaneselvans commented Sep 6, 2024

aesharpe commented Sep 6, 2024

zaneselvans commented Sep 6, 2024

zaneselvans commented Sep 6, 2024

Extract 714 xbrl #3822

Extract 714 xbrl #3822

Conversation

aesharpe commented Aug 30, 2024 • edited Loading

Overview

What problem does this address?

What did you change?

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

aesharpe commented Sep 4, 2024 • edited Loading

cmgosnell left a comment

Choose a reason for hiding this comment

cmgosnell left a comment

Choose a reason for hiding this comment

aesharpe commented Sep 5, 2024

cmgosnell commented Sep 5, 2024

zaneselvans commented Sep 6, 2024

aesharpe commented Sep 6, 2024

aesharpe commented Sep 6, 2024 • edited Loading

zaneselvans commented Sep 6, 2024

aesharpe commented Sep 6, 2024

zaneselvans commented Sep 6, 2024

zaneselvans commented Sep 6, 2024

aesharpe commented Aug 30, 2024 •

edited

Loading

aesharpe commented Sep 4, 2024 •

edited

Loading

aesharpe commented Sep 6, 2024 •

edited

Loading