-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utility "owners" are not harvested into normalized EIA utility table #1393
Comments
I feel like we've come across something like this before. This seems like a specific case of a more general issue that I suspect we'll run into other places -- that it's possible to have multiple columns in a table which have foreign keys referring to the same other table -- because the same entities can have multiple different kinds of relationships with each other. So somehow in the normalization process we need to be able to indicate which columns are associated with what entities, even when they don't have the canonical column names. But then we'll also have to deal with the other attribute columns in those tables, and identifying which of the multiple PK columns they're supposed to be associated with. Also, doesn't the original |
I wonder if a simple renaming of the columns would fix this in the near term. The plant-operator relationship should already be captured in the Like if we're going to fail to harvest some data from the table, we should probably choose to fail on data we know we should be getting somewhere else already. |
I think this would be a fine - if not somewhat janky - solution. It definitely would be outside of our current convention of what a I'm personally less concerned about this a short term issue... it is annoying to work around right now but not the end of the world. What I think would need to happen for a more durable solution in the new harvesting process: for every column that is to be harvested, the default behavior is that is collects columns with the exact same name from any table that shares the harvested table's primary keys. But a possibility is to collect columns with any number of names from tables that share the harvested table's primary key. |
Maybe this is what you're suggesting, but it should be possible to harvest selectively based on name and the foreign key relationships. Like any column that is already identified as referring to an entity table column (like I thought the renaming was janky initially but now I'm not so sure. Does it make sense to have the operator |
Yes, that is what I was thinking. If the entity table column can be harvested from more than one column name, in this case, we could grab both the On the renaming.. i mostly feels janky because this would go against the naming convention for the utility that we have in all other tables. In any other table where a utility id and a plant are present, the utility id is the operator. In this table, that would be different. Right now, this table is only plant/generator IDs, utility (operator) ID/info and utility (owner) ID/info. I guess I could imagine just fully dropping the utility (operator) ID/info.... but this goes against our desire to STOP dropping columns before they get harvested. Really both the operator and the owner are utilities and should be collected as such. |
@knordback This issue is adjacent to the changes you made recently on the #509 branch. |
@katie-lamb is this one donezo now? i think yes |
There are about 1000 utilities that are currently not captured in the
utilities_eia
table through the harvesting process because they only show up as "owners" in theownership_eia860
table.This is causing me trouble because I am using the ownership table in the EIA plant-part list to generate plant-part records for each owner. This results in ~3% of the plant-part records not having
utility_id_pudl
because I was assuming that all of the utilities were inutilities_eia
. I think these owner records do haveutility_id_pudl
because we are capturing them in the PUDL id mapping process - but they are not actually linked to them because this link happens through theutilities_eia
table.The text was updated successfully, but these errors were encountered: