-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract 714 xbrl #3822
Extract 714 xbrl #3822
Conversation
…n functions to FERC 714 extract module and include ferc714 io_manager.
…ix functions mimic those of FERC 1
Unit test failures have something to do with this snipit I changed in io_managers.py:
The UPDATE: I fixed this by changing this snipit to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two requests for changes in comments, plus the settings yml files need to updated! but those seem like minor tweaks overall this is looking great imo and I was able to materialize these xbrl raw assets locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two blocking needs here:
to change
years = ", ".join(map(str, ferc714_settings.years))
to
years = ", ".join(map(str, ferc714_settings.csv_years))
in _extract_raw_ferc714_csv
and again to add the new years in the settings yml files.
The reason I didn't do this is because I think it will break the transforms and I didn't want to merge it into |
hm this is a good point... but I think in this context because the only existing transforms are based on the csv extracted tables they will be fine. but still i hadn't put it in context that this was only the extraction so i will renege on this suggestion. |
* Add _csv suffix to current FERC714 raw assets and functions * Add create_raw_ferc714_xbrl_assets and raw_ferc714_xbrl__metadata_json functions to FERC 714 extract module and include ferc714 io_manager. * Update XBRL table dict to mimic the TABLE_NAME_MAP_FERC1 format and fix functions mimic those of FERC 1 * Add .get_datasets() to ferc_settings definition in FercXBRLSQLiteIOManager * Update FERC714 working paritions and add xbrl_year function to Ferc714Settings * Fix < to > in xbrl_years * Change test output names to ferc1_xbrl from test_db * Fix ferc_settings definition in FercXBRLSQLiteIOManager load_input function to fix unit tests * Remove old FERC714_XBRL_FILES dict * Add and update dictionary descriptions * Update release notes * Fix ferc io_managers and test * Add csv_years property to Ferc714Settings * add csv_years reference to raw_ferc714_csv_asset_factory function
I think the integration test failures are probably because the tests don't know they need to run the FERC-714 XBRL to SQLite conversion before the main PUDL ETL can happen. You can follow the pattern that's used for the FERC Form 1 XBRL to SQLite conversion in @pytest.fixture(scope="session", name="ferc1_engine_xbrl")
def ferc1_xbrl_sql_engine(ferc1_xbrl_extract, dataset_settings_config) -> sa.Engine:
"""Grab a connection to the FERC Form 1 DB clone."""
context = build_init_resource_context(
resources={"dataset_settings": dataset_settings_config}
)
return ferc1_xbrl_sqlite_io_manager(context).engine
@pytest.fixture(scope="session", name="ferc1_xbrl_taxonomy_metadata")
def ferc1_xbrl_taxonomy_metadata(ferc1_engine_xbrl: sa.Engine):
"""Read the FERC 1 XBRL taxonomy metadata from JSON."""
result = materialize_to_memory([raw_ferc1_xbrl__metadata_json])
assert result.success
return result.output_for_node("raw_ferc1_xbrl__metadata_json") And then the PUDL IO Manager has a declared dependency on the FERC1 XBRL engine: @pytest.fixture(scope="session")
def pudl_io_manager(
ferc1_engine_dbf: sa.Engine, # Implicit dependency
ferc1_engine_xbrl: sa.Engine, # Implicit dependency
live_dbs: bool,
pudl_datastore_config,
dataset_settings_config,
request,
) -> PudlMixedFormatIOManager: |
@zaneselvans thank you for that suggestion! I just had to add one more fixture for |
mmm getting a unit test failure seemingly related to timeseries. Is timeseries cleaning part of the extraction or transformation phase @zaneselvans?
|
Oh, the timeseries issue is probably a rare stochastic failure -- it happens once in a blue moon, probably having something to do with the seed being used for the random number generator. If you re-run the unit tests I suspect they'll pass just fine. |
That is wild and also strangely comforting. I'm glad I didn't try and preemptively figure this out...haha |
I spent a bunch of time at some point hunting down all the RNG initializations and tried to give them fixed seeds in the tests so we always get the same outputs, but I must have missed one somewhere, and it bites us every so often. It just didn't seem like it was worth continuing the hunt because it's so infrequent. |
Especially in the unit tests, whenever there's something like this that seems totally out of left field and unrelated to what's being changed my first instinct is just to re-run the tests to see if it's real. Doesn't take much time or effort, and often it works. |
Overview
closes #3813
closes #3814
closes #3815
What problem does this address?
We don't currently extract XBRL tables for the FERC 714 data. This code adds the capability to do so.
What did you change?
pudl/extract/ferc714.py
_csv
suffix to csv-specific extraction functions/variables.FERC714_CSV_ENCODING
TABLE_NAME_MAP_FERC714
dictionarycreate_raw_ferc714_xbrl_assets
to reference the raw xbrl tables in the FERC 714 SQLite db and call it.raw_ferc714_xbrl__metadata_json
to make an asset for the metadata tablepudl/io_managers.py
FercXBRLSQLiteIOManager
to accommodate more than justferc1
ferc714_xbrl_sqlite_io_manager
functionpudl/settings.py
xbrl_years
functino to the Ferc714Settings class to be able to access XBRL years.pudl/metadata/sources.py
Testing
How did you make sure this worked? How can a reviewer verify this?
Load each of the raw assets via and make sure they contain the expected years:
To-do list