-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make leafy balance sheet assets & liabilities data #2805
Conversation
For reasons I don't yet understand, I am getting non-deterministic behavior in the graph pruning process. For both Running the [
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA)
],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='other3', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='other2', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA)
],
[],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_plant_purchased_or_sold', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_plant_purchased_or_sold', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_completed_construction_not_classified', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_property_under_capital_leases', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_property_under_capital_leases', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_completed_construction_not_classified', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA)
],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_plant_purchased_or_sold', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_property_under_capital_leases', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_plant_purchased_or_sold', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_completed_construction_not_classified', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_completed_construction_not_classified', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_property_under_capital_leases', utility_type='other1', plant_status=pd.NA, plant_function=pd.NA)
],
[],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA)
],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA)
],
[
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='common', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use', utility_type='gas', plant_status=pd.NA, plant_function=pd.NA),
NodeId(table_name='utility_plant_summary_ferc1', xbrl_factoid='construction_work_in_progress', utility_type='electric', plant_status=pd.NA, plant_function=pd.NA)
],
[],
] Interestingly, the result remains consistent so long as I don't restart the notebook kernel -- I can re-run all the cells and get the same result over and over again, as long as I don't restart the kernel. My intuition is that it has something to do with the way we're iteratively removing nodes that have only stepchildren in the for node in self.stepparents(forest):
children = set(forest.successors(node))
stepchildren = set(self.stepchildren(forest)).intersection(children)
if (children == stepchildren) & (len(children) > 0):
forest.remove_node(node) The problem here is that we're iterating over something that's being mutated within the loop... which could yield some non-deterministic behavior if the ordering of what's returned is random. We could address this either by storing a constant list of stepparents before looping, or by accummulating a list of all the nodes we want to remove and then removing them outside the loop, but I think these are subtly different. Freezing the initial list of stepparents before iterating: stepparents = self.stepparents(forest)
for node in stepparents:
children = set(forest.successors(node))
stepchildren = set(self.stepchildren(forest)).intersection(children)
if (children == stepchildren) & (len(children) > 0):
forest.remove_node(node) Results in what seems to be identical behavior (but is still safer!) If we also wait to remove the set of nodes that meet our criteria until after the loop exits: stepparents = self.stepparents(forest)
nodes_to_remove = []
for node in stepparents:
children = set(forest.successors(node))
stepchildren = set(self.stepchildren(forest)).intersection(children)
if children and (children == stepchildren):
nodes_to_remove.append(node)
forest.remove_nodes_from(nodes_to_remove) Then we always get the same output and there are no stepchildren!!!! However... Going back to removing nodes that only have stepchildren ONLY if they also have stepparents = self.stepparents(forest)
nodes_to_remove = []
for node in stepparents:
children = set(forest.successors(node))
stepchildren = set(self.stepchildren(forest)).intersection(children)
if (
len(children) != 0
and (children == stepchildren)
and (node.utility_type == "total")
):
nodes_to_remove.append(node)
forest.remove_nodes_from(nodes_to_remove) We get an output that looks more like we expect, but is it actually correct? |
I think the former way to do this it removed one stepparent at a time. So every time during the I think a better way to get a deterministic outcome would be to sort the stepparents going into the loop so it happens in the same order every time. I'm gonna check doing something like that. |
Sorting the list of nodes to iterate over and allowing the list to mutate while we iterate may make the process deterministic, but will it necessarily be correct? What's our criteria for choosing which path to keep? This is why originally I only removed the stepparents that also had |
well turns out it didn't make the process deterministic! sooo there is that. but in terms of what is "right", i don't think it matters which path we choose if there are two paths to the same set of outcomes. your right that choosing the totals as the ones to remove will help narrow the choice down, but it looks like there are still a lot of leftover stepparents with that method. |
we employed sorted lists and it locked in the orderrrrr 🎉 |
src/pudl/output/ferc1.py
Outdated
# There are a few rare instances where a particular node is specified with more | ||
# than one weight (-1 vs. 1) and in those cases, we always want to keep the | ||
# weight of -1, since it affects the overall root->leaf calculation outcome. | ||
# Maybe this should actually happen in set_forest_attributes? | ||
multi_valued_weights = ( | ||
self.exploded_calcs.groupby(self.calc_cols, dropna=False)["weight"] | ||
.transform("nunique") | ||
.gt(1) | ||
) | ||
calcs_to_drop = multi_valued_weights & (self.exploded_calcs.weight == 1) | ||
deduplicated_calcs = self.exploded_calcs.drop(calcs_to_drop.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works fine and i think is a good solution for now, but i think the more holistic way to do this would be to get all of the nodes direct predecessor nodes in order to generate both the parent_cols and calc_cols in order to select the individual records to select on within exploded_calcs
during the merge in set_forest_attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk about this. I'm not sure I understand what you're saying. I agree this probably belongs over in set_forest_attributes()
or maybe it its own deduplicate_calcs()
method that's called by set_forest_attributes()
I think we need to check whether the nodes we're dropping when enforcing the tree structure:
If either of those statements are untrue, then by removing them we will be altering the outcome of the final root -> leaf calculations and what tags end up being applied to the leaf nodes. The tag check is a little weird because right now we're applying tags only on the basis of matching the table & fact |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## explode_ferc1 #2805 +/- ##
===============================================
+ Coverage 88.4% 88.6% +0.1%
===============================================
Files 89 90 +1
Lines 10711 10832 +121
===============================================
+ Hits 9478 9599 +121
Misses 1233 1233
☔ View full report in Codecov by Sentry. |
src/pudl/output/ferc1.py
Outdated
# There are a few rare instances where a particular node is specified with more | ||
# than one weight (-1 vs. 1) and in those cases, we always want to keep the | ||
# weight of -1, since it affects the overall root->leaf calculation outcome. | ||
# Maybe this should actually happen in set_forest_attributes? | ||
multi_valued_weights = ( | ||
self.exploded_calcs.groupby(self.calc_cols, dropna=False)["weight"] | ||
.transform("nunique") | ||
.gt(1) | ||
) | ||
calcs_to_drop = multi_valued_weights & (self.exploded_calcs.weight == 1) | ||
deduplicated_calcs = self.exploded_calcs.loc[~calcs_to_drop] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many true multi_valued_weights
do we expect? it might be good to add a little assertion with some expected value to make sure we don't end up unexpectedly dropping a ton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. It should be very small proportion compared to the number of overall calculations.
@e-belfer I fleshed out the Tasklist based on the comments @cmgosnell left before she went on vacation and notes from our handoff discussion. The tasks that sound significant include the following (which hopefully aren't just gobbledygook...) Re-evaluate conflicting weights check and passthrough calculationsPreviously we were excising passthrough calculations, that had a single parent and a single child (plus a correction node) to redirect calculations that involved duplicate values, but doing this we had to be careful never to remove a node with a weight that would alter the full root-to-leaf calculation weight (i.e. any weight other than 1.0). Since adding all of the other dimensions ( However, another kind of issue with "conflicting" weights has come up, in which there are multiple paths to a few nodes in the forest, because sometimes calculations are decomposed in two different ways in parallel -- one breaking things down by xbrl fact, and another breaking things down by the additional dimensions. We have to choose one or the other in order to avoid having duplicate value getting aggregated by the tree, but at some level it doesn't matter which one we choose (unless there's another metadata dimension like Right now we are using a heuristic and some random chance to decide which calculations to drop. Any time that all of a node's children have multiple parents, we know that we can safely remove that node without dropping any of the downstream (leafward) values. This gets rid of almost all of the duplicate paths, but there are a couple of cases where some of the children don't quite cover one of the decompositions completely, so we have to purposefully select the decomposition that does include all of the children and drop the node that doesn't. I think that Christina is suggesting that as long as removing a node doesn't result in the loss of any leaves, and we do end up with forest structure in the end, then the calculations should be valid. Debugging tag conflicts & tag propagationSome similar issues come up with the tagging -- which nodes get manually tagged is a bit random, and based on what facts are most salient to the tagger. We need to propagate those tags all the way down to the leaf nodes, so that when we do the root-to-leaf calculation, we have all of the tagging information available. But this creates the possibility of conflicts between separate tags. If a node and one of its descendants have different tags, then whichever one is propagated last is going to end up having precedence, which may or may not be correct. I think these conflicts are probably user input errors and should be flagged and resolved manually. We also need to check and make sure that none of the nodes we are pruning contain tags that are then lost from the tree. It also looks like there's some kind of issue with the way the tags are being propagated to the leaves, since there are at leaset a few cases in which the tags of parent nodes aren't showing up in their children. But it could also be that I'm generating the |
Some observations after adding all of the dimensions and merging in the intra-dimensional total fixes. Discussing with one or more of @cmgosnell @e-belfer or @jdangerx would probably be helpful. No more passthroughs
We have actual forests
Some calculations aren't matching exactly
Pruned and conflicting tags
Conflicting weights & multiple possible calculation paths
Debugging via calculation consistency
|
@e-belfer here's what I've been doing to generate the outputs. Imports & pull some tablesimport pandas as pd
import pudl
from pudl.workspace.setup import PudlPaths
from pudl.etl import defs
from pudl.output.ferc1 import (
Exploder,
MetadataExploder,
NodeId,
XbrlCalculationForestFerc1,
EXPLOSION_CALCULATION_TOLERANCES,
)
clean_xbrl_metadata_json = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))
metadata_xbrl_ferc1 = defs.load_asset_value(AssetKey("metadata_xbrl_ferc1"))
calculation_components_xbrl_ferc1 = defs.load_asset_value(
AssetKey("calculation_components_xbrl_ferc1")
)
tags_df = defs.load_asset_value(AssetKey("_out_ferc1__explosion_tags")) Define the input arguments:explosion_args = {
"income_statement_ferc1": {
"root_table": "income_statement_ferc1",
"table_names_to_explode": [
"income_statement_ferc1",
"depreciation_amortization_summary_ferc1",
"electric_operating_expenses_ferc1",
"electric_operating_revenues_ferc1",
],
"calculation_tolerance": EXPLOSION_CALCULATION_TOLERANCES["income_statement_ferc1"],
"seeds": [
NodeId(
table_name="income_statement_ferc1",
xbrl_factoid="net_income_loss",
utility_type="total",
plant_status=pd.NA,
plant_function=pd.NA,
),
],
"tags": tags_df,
},
"balance_sheet_assets_ferc1": {
"root_table": "balance_sheet_assets_ferc1",
"table_names_to_explode": [
"balance_sheet_assets_ferc1",
"utility_plant_summary_ferc1",
"plant_in_service_ferc1",
"electric_plant_depreciation_functional_ferc1",
],
"calculation_tolerance": EXPLOSION_CALCULATION_TOLERANCES["balance_sheet_assets_ferc1"],
"seeds": [
NodeId(
table_name="balance_sheet_assets_ferc1",
xbrl_factoid="assets_and_other_debits",
utility_type=pd.NA,
plant_status=pd.NA,
plant_function=pd.NA,
)
],
"tags": tags_df,
},
"balance_sheet_liabilities_ferc1": {
"root_table": "balance_sheet_liabilities_ferc1",
"table_names_to_explode": [
"balance_sheet_liabilities_ferc1",
"retained_earnings_ferc1",
],
"calculation_tolerance": EXPLOSION_CALCULATION_TOLERANCES["balance_sheet_liabilities_ferc1"],
"seeds": [
NodeId(
table_name="balance_sheet_liabilities_ferc1",
xbrl_factoid="liabilities_and_other_credits",
utility_type=pd.NA,
plant_status=pd.NA,
plant_function=pd.NA,
)
],
"tags": tags_df,
},
} Function to create exploded assetsdef exploded_table(
root_table: str,
table_names_to_explode: list[str],
calculation_tolerance: float,
seeds: list[NodeId],
tags: pd.DataFrame,
):
metadata_xbrl_ferc1 = defs.load_asset_value(AssetKey("metadata_xbrl_ferc1"))
calculation_components_xbrl_ferc1 = defs.load_asset_value(
AssetKey("calculation_components_xbrl_ferc1")
)
dfs_to_explode = {
table: pd.read_sql(table, pudl_engine) for table in table_names_to_explode
}
exploder = Exploder(
root_table=root_table,
table_names=table_names_to_explode,
metadata_xbrl_ferc1=metadata_xbrl_ferc1,
calculation_components_xbrl_ferc1=calculation_components_xbrl_ferc1,
seed_nodes=seeds,
calculation_tolerance=calculation_tolerance,
tags=tags,
)
return {
"exploder": exploder,
"exploded_data": exploder.boom(tables_to_explode=dfs_to_explode),
}
test_explode = {}
for root_table in explosion_args:
print(f"Exploding: {root_table}")
test_explode[root_table] = exploded_table(**explosion_args[root_table]) Inspect one of the explosionsroot_table = "balance_sheet_assets_ferc1"
exploder = test_explode[root_table]["exploder"]
exploded_data = test_explode[root_table]["exploded_data"]
calc_forest = exploder.calculation_forest
exploded_meta = calc_forest.exploded_meta
forest_as_table = calc_forest.forest_as_table
leafy_meta = calc_forest.leafy_meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple abstraction-level questions that we should probably discuss, though largely I think we can merge this. It would probably also be good to discuss my thoughts on how to clean up the graph more upstream and whether they're totally off base or not.
exploded_calcs=self.exploded_calcs, | ||
tags=self.tags, | ||
) | ||
# HACK alter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmgosnell made this note and I've also been confused by it.
Most of my comments were non-blocking and we addressed the blocking one on a synchronous call - see note above. There are things we might want to change, but those can happen as a different PR onto |
PR Overview
Some final fixes that are specific to the balance sheet asset calculation tree, allowing it to be used for annotating and filtering the exploded balance sheet asset & liability data tables.
Tasks
PR Checklist
dev
).