Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make leafy balance sheet assets & liabilities data #2805

Merged
merged 60 commits into from
Sep 20, 2023
Merged
Changes from 4 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
a959df6
fix twiddly overlapping almost stepparents
cmgosnell Aug 24, 2023
d649d4f
deal with deterministic shit omigoshhhh
cmgosnell Aug 25, 2023
2e8e3f7
remove one loose total w/ different dims in children
cmgosnell Aug 25, 2023
e78915a
Selectively remove calculations that specify conflicting weights.
zaneselvans Aug 25, 2023
9cb516c
select on the mask before dropping instead of dropping the full masks…
cmgosnell Aug 25, 2023
f3b1622
lol remove a very chatty log
cmgosnell Aug 25, 2023
cd74c24
Simplify dropping of duplicate weights
zaneselvans Aug 25, 2023
67e7caa
Check we aren't dropping nodes with weight = -1
zaneselvans Aug 26, 2023
43f8857
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Aug 26, 2023
f5b53ab
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Aug 28, 2023
46b0997
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Aug 29, 2023
7640af7
Check that multi-valued weights have expected frequency.
zaneselvans Aug 30, 2023
e11ef21
Add sanity checks for conflicting weights.
zaneselvans Aug 30, 2023
3e09106
add ability to add specific dimensions into tags
cmgosnell Aug 30, 2023
c2d2102
Merge branch 'explode_tree_fixes' into explode_tag_dimensions
cmgosnell Aug 30, 2023
47f323e
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Aug 30, 2023
0b1fe1d
Merge branch 'explode_tree_fixes' into explode_tag_dimensions
cmgosnell Aug 30, 2023
b78e69a
Defer reporting duplicated notes until after pruning
zaneselvans Aug 30, 2023
11533d9
Merge branch 'explode_tree_fixes' into explode_tag_dimensions
zaneselvans Aug 31, 2023
68cf7f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 31, 2023
f2a06dc
Merge pull request #2817 from catalyst-cooperative/explode_tag_dimens…
zaneselvans Aug 31, 2023
e523d3a
draft version of making the forest as a table
cmgosnell Sep 1, 2023
1f1f8eb
Merge branch 'explode_tag_dimensions' into explode_forest_as_table
cmgosnell Sep 1, 2023
d829570
lil clean up reorg fun times
cmgosnell Sep 1, 2023
81dafb9
Set error tolerances such that CI can pass.
zaneselvans Sep 2, 2023
17b2022
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 7, 2023
cb63e55
Merge branch 'explode_tree_fixes' into explode_forest_as_table
zaneselvans Sep 7, 2023
08b87b1
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 7, 2023
a352046
Merge branch 'explode_tree_fixes' into explode_forest_as_table
zaneselvans Sep 7, 2023
3b6217b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2023
d9d1cb1
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 7, 2023
8f81e9c
Merge branch 'explode_tree_fixes' into explode_forest_as_table
zaneselvans Sep 7, 2023
f176a84
Refactor XbrlCalculationForestFerc1.forest_as_table() to be recursive.
zaneselvans Sep 11, 2023
af6b6d0
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 11, 2023
042d852
Merge branch 'explode_tree_fixes' into explode_forest_as_table
zaneselvans Sep 11, 2023
16307e0
Update to steup-micromamba and checkout@v4 actions
zaneselvans Sep 11, 2023
2ae4ae3
Update to steup-micromamba and checkout@v4 actions
zaneselvans Sep 11, 2023
1a2e444
Use pathlib.Path for ogr2ogr
zaneselvans Sep 11, 2023
4cab17f
Merge pull request #2832 from catalyst-cooperative/explode_forest_as_…
zaneselvans Sep 11, 2023
7b51b3e
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 12, 2023
19d172e
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 12, 2023
f728ff1
Add checks for nodes with conflicting tags or lost tagged nodes.
zaneselvans Sep 13, 2023
05d45f7
Flesh out docstrings for annotated_forest and node check functions.
zaneselvans Sep 13, 2023
08e0ff4
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 14, 2023
e29492f
Use pd.isna() to identify NA explosion tags and replace with empty dict
zaneselvans Sep 14, 2023
2c8c1f5
Consolidate calculation tolerances into a data class.
zaneselvans Sep 14, 2023
26f3491
Add columns to exploded metadata for debugging and readability purposes
zaneselvans Sep 15, 2023
56e0c7f
Flesh out docstring for prune_unrooted()
zaneselvans Sep 15, 2023
e5157d4
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 15, 2023
16a7204
Minor readability / typo fixes
zaneselvans Sep 16, 2023
56eabf5
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 18, 2023
d018a5d
Merge branch 'dev' into explode_tree_fixes
zaneselvans Sep 18, 2023
a9ddaa0
Model calculation weights as edge attributes.
zaneselvans Sep 18, 2023
b153c2a
Remove 'passthrough' node infrastructure.
zaneselvans Sep 18, 2023
a88e620
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 18, 2023
5751827
Drop DBF metadata columns when aligning row numbers since we don't ne…
zaneselvans Sep 18, 2023
62ef211
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 18, 2023
884dfac
Merge branch 'relax-libabseil' into explode_tree_fixes
zaneselvans Sep 19, 2023
6df7967
Merge branch 'explode_ferc1' into explode_tree_fixes
zaneselvans Sep 19, 2023
421bada
Consolidate MetadataExploder properties into (Data)Exploder class.
zaneselvans Sep 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 62 additions & 17 deletions src/pudl/output/ferc1.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""A collection of denormalized FERC assets and helper functions."""
import importlib
import re
from io import StringIO
from typing import Literal, NamedTuple, Self

import networkx as nx
Expand Down Expand Up @@ -1807,9 +1808,11 @@ def prune_unrooted(self: Self, graph: nx.DiGraph) -> nx.DiGraph:
"""Prune any portions of the digraph that aren't reachable from the roots."""
seeded_nodes = set(self.seeds)
for seed in self.seeds:
# the seeds and all of their descendants from the graph
seeded_nodes = list(
seeded_nodes.union({seed}).union(nx.descendants(graph, seed))
)
# any seeded node that is also a parent
seeded_parents = [
node
for node, degree in dict(graph.out_degree(seeded_nodes)).items()
Expand All @@ -1820,6 +1823,14 @@ def prune_unrooted(self: Self, graph: nx.DiGraph) -> nx.DiGraph:
.loc[seeded_parents]
.reset_index()
)
seeded_child_nodes = list(
set(
seeded_calcs[self.calc_cols].itertuples(index=False, name="NodeId")
).intersection(graph.nodes)
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved
)
seeded_calcs = (
seeded_calcs.set_index(self.calc_cols).loc[seeded_child_nodes].reset_index()
)
seeded_digraph: nx.DiGraph = self.exploded_calcs_to_digraph(
exploded_calcs=seeded_calcs
)
Expand Down Expand Up @@ -1881,27 +1892,24 @@ def forest(self: Self) -> nx.DiGraph:
forest.remove_nodes_from(correction + [node])
forest.add_edge(parent[0], child[0])

if not nx.is_forest(forest):
logger.error(
"Calculations in Exploded Metadata can not be represented as a forest!"
)
connected_components = list(nx.connected_components(forest.to_undirected()))
logger.debug(
f"Calculation forest contains {len(connected_components)} connected components."
)

# Remove any node that:
# - ONLY has stepchildren.
# - AND has utility_type total
for node in self.stepparents(forest):
nodes_to_remove = []
stepparents = sorted(self.stepparents(forest))
logger.info(f"Investigating {len(stepparents)=}")
for node in stepparents:
children = set(forest.successors(node))
stepchildren = set(self.stepchildren(forest)).intersection(children)
if (
(children == stepchildren)
& (len(children) > 0)
& (node.utility_type == "total")
):
if (children == stepchildren) & (len(children) > 0):
nodes_to_remove.append(node)
forest.remove_node(node)
logger.info(f"Removed {len(nodes_to_remove)} redundant/stepparent nodes.")
logger.debug(f"Removed redunant/stepparent nodes: {sorted(nodes_to_remove)}")

# Prune any newly disconnected nodes resulting from the above removal of
# pure stepparents. We expect the set of newly disconnected nodes to be empty.
Expand All @@ -1911,13 +1919,50 @@ def forest(self: Self) -> nx.DiGraph:

if pruned_nodes := set(nodes_before_pruning).difference(nodes_after_pruning):
raise AssertionError(f"Unexpectedly pruned stepchildren: {pruned_nodes=}")
# HACK alter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmgosnell made this note and I've also been confused by it.

# two different parents. those parents have different sets of dimensions.
# sharing some but not all of their children so they weren't caught from in the
# only stepchildren node removal from above. a generalization here would be good
remove_almost_stepparents = pd.read_csv(
StringIO(
"""
table_name,xbrl_factoid,utility_type,plant_status,plant_function
utility_plant_summary_ferc1,depreciation_amortization_and_depletion_utility_plant_leased_to_others,total,,
utility_plant_summary_ferc1,depreciation_and_amortization_utility_plant_held_for_future_use,total,,
utility_plant_summary_ferc1,utility_plant_in_service_classified_and_unclassified,total,,
"""
)
).convert_dtypes()
forest.remove_nodes_from(
list(remove_almost_stepparents.itertuples(index=False, name="NodeId"))
)
forest = self.prune_unrooted(forest)
if not nx.is_forest(forest):
logger.error(
"Calculations in Exploded Metadata can not be represented as a forest!"
)
remaining_stepparents = set(self.stepparents(forest))
if remaining_stepparents:
logger.info(f"{remaining_stepparents=}")

# There are a few rare instances where a particular node is specified with more
# than one weight (-1 vs. 1) and in those cases, we always want to keep the
# weight of -1, since it affects the overall root->leaf calculation outcome.
# Maybe this should actually happen in set_forest_attributes?
multi_valued_weights = (
self.exploded_calcs.groupby(self.calc_cols, dropna=False)["weight"]
.transform("nunique")
.gt(1)
)
calcs_to_drop = multi_valued_weights & (self.exploded_calcs.weight == 1)
deduplicated_calcs = self.exploded_calcs.drop(calcs_to_drop.index)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works fine and i think is a good solution for now, but i think the more holistic way to do this would be to get all of the nodes direct predecessor nodes in order to generate both the parent_cols and calc_cols in order to select the individual records to select on within exploded_calcs during the merge in set_forest_attributes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about this. I'm not sure I understand what you're saying. I agree this probably belongs over in set_forest_attributes() or maybe it its own deduplicate_calcs() method that's called by set_forest_attributes()


# forest = self.set_forest_attributes(
# forest,
# exploded_meta=self.exploded_meta,
# exploded_calcs=self.exploded_calcs,
# tags=self.tags,
# )
forest = self.set_forest_attributes(
forest,
exploded_meta=self.exploded_meta,
exploded_calcs=deduplicated_calcs,
tags=self.tags,
)
return forest

@staticmethod
Expand Down