Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FK relationship between out_eia__yearly_plant_parts and out_pudl__yearly_assn_eia_ferc1_plant_parts #3818

Open
3 tasks
cmgosnell opened this issue Aug 27, 2024 · 0 comments
Labels
bug Things that are just plain broken. ppe Plant Parts EIA (formerly the EIA plant parts list)

Comments

@cmgosnell
Copy link
Member

cmgosnell commented Aug 27, 2024

Overview

What is the problem we're solving? For very simple items, this can be encapsulated in the success criteria.
When trying to add this FK relationship, we are getting this error after running pudl_check_fks:

pudl.etl.check_foreign_keys.ForeignKeyErrors: Foreign key error for table: out_pudl__yearly_assn_eia_ferc1_plant_parts -- out_eia__yearly_plant_parts (record_id_eia) -- on rows [37132 38128 39128 40148 41165 42194 43237 44269 45283 46261]

What I'm seeing right now is that a) it seems like I can trust that these rows from the error message is the index in out_pudl__yearly_assn_eia_ferc1_plant_parts. Why? because the numbers are exactly the same in the nightly build failure and when I reproduce this failure locally. All of these records are associated with the same plant that are all plant_part=="plant_match_ferc1". I keep finding that these "missing" records are actually in the out_eia__yearly_plant_parts. This FK rule was adding doing a different task and is out of scope to fix for that project (#3774)

To reproduce this - turn off the foreign_key_rules exclude, rerun pudl_check_fks to get the missing row's:

out_eia__yearly_plant_parts = defs.load_asset_value("out_eia__yearly_plant_parts")
out_pudl__yearly_assn_eia_ferc1_plant_parts = defs.load_asset_value("out_pudl__yearly_assn_eia_ferc1_plant_parts")

rows= [37132, 38128, 39128, 40148, 41165, 42194, 43237, 44269, 45283, 46261]
missing = list(out_pudl__yearly_assn_eia_ferc1_plant_parts.loc[rows, "record_id_eia"])
out_eia__yearly_plant_parts[out_eia__yearly_plant_parts.record_id_eia.isin(missing)]

For me, all of the record_id_eia's that are implicated in this have this structure: 2830_**_20**_plant_match_ferc1_total_3542

Success Criteria

How will we know that we're done?

  • we are able to remove the FK relationship exclusion between out_eia__yearly_plant_parts's record_id_eia and out_pudl__yearly_assn_eia_ferc1_plant_parts

Next steps

@cmgosnell cmgosnell added bug Things that are just plain broken. ppe Plant Parts EIA (formerly the EIA plant parts list) labels Aug 27, 2024
@cmgosnell cmgosnell changed the title Add FK relationship between out_eia__yearly_plant_parts and out_pudl__yearly_assn_eia_ferc1_plant_parts Add FK relationship between out_eia__yearly_plant_parts and out_pudl__yearly_assn_eia_ferc1_plant_parts Aug 27, 2024
@cmgosnell cmgosnell changed the title Add FK relationship between out_eia__yearly_plant_parts and out_pudl__yearly_assn_eia_ferc1_plant_parts Add FK relationship between out_eia__yearly_plant_parts and out_pudl__yearly_assn_eia_ferc1_plant_parts Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken. ppe Plant Parts EIA (formerly the EIA plant parts list)
Projects
Status: New
Development

No branches or pull requests

1 participant