Force tables to have all columns that are defined in schema #147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In catalyst-cooperative/pudl#2897 I found that we were missing some columns because the
.unstack()
inconstruct_dataframe
doesn't create columns for values that don't show up at all, even if they're defined in the metadata. Applying a reindex makes sure we get everything.This also was causing some integration test failures - when running the ETL in-process, we would:
Lastly, I wonder if there's a way we could keep our extracted tables tidy - our transforms in PUDL promptly re-stack these wide tables in
wide_to_tidy
, so maybe we can skip that completely. But that's definitely out of scope of this PR.