-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 509 #2333
Open
knordback
wants to merge
40
commits into
main
Choose a base branch
from
bug-509
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Bug 509 #2333
Changes from 2 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
c48584e
Remove plant_name_eia in pudl.transform.eia923.fuel_receipts_costs()
knordback 13a0dc1
Don't discard plant_name_eia from pudl.transform.eia923.generation_fu…
knordback b4a3c28
Don't drop operator_name/utility_name_eia in pudl.transform.eia923.ge…
knordback 35b6c60
Move newly added code to what seems like a more sensible location
knordback d6021ac
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] adc4503
Extract the core of AbstractTableTransformer.enforce_schema() into Re…
knordback 2bcd803
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback c44949b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 731baef
Use Resource.endorse_schema() to clean up non-matching fields in Data…
knordback eef6cbf
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback fbe7b21
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e2768f9
Move DataFrame cleaning to the end of transform()
knordback 46d7e66
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback 508ba29
Call enforce_schema() on entities DataFrames as well
knordback 126102e
Merge branch 'dev' into bug-509
knordback e489ed4
Merge branch 'dev' into bug-509
zaneselvans dc5068e
Don't drop combined_heat_power in boiler_fuel()
knordback fdada61
One more incremental change in boiler_fuel()
knordback 40ef5f4
Several more fields not dropped
knordback dd9982a
Merge in substantial changes from dev.
zaneselvans bb60591
More un-dropping of fields
knordback fc099cc
Merge branch 'dev' into bug-509
knordback c91edf1
Merge branch 'dev' into bug-509
knordback 2efd152
Remove code made obsolete in merge from dev
knordback d464394
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3675eba
Don't drop total_fuel_consumption_quantity in clean_boiler_fuel_eia923()
knordback cb74321
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 54dd4cc
More field non-dropping
knordback 761ba7b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 67105a1
Don't drop fields in clean_generation_eia923()
knordback ee0ae79
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d962d4f
Bug 509 work in clean_fuel_receipts_costs_eia923()
knordback 1d466f1
Don't drop fields in clean_generation_fuel_eia923()
knordback 15cb0c6
Remove a useless function call
knordback 97ff719
Merge branch 'dev' into bug-509
knordback a4f6234
Merge branch 'dev' into bug-509
knordback 968267c
Merge branch 'dev' into bug-509
knordback a6f7f9b
Merge branch 'dev' into bug-509
zaneselvans 5d3a0e8
Merge branch 'dev' into bug-509
knordback eaa7615
Merge branch 'dev' into bug-509
knordback File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to wait to do this until the very end (after the balancing authority fix below)?
Is there any reason not to call
enforce_schema()
on the entity tables too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had already moved the code to the very end of the function.
And I added code to do the same thing for the entity tables. That doesn't seem to cause any distress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's great! I'm slightly surprised. Do you want to try running
tox -e nuke
overnight and see if stays happy?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not successful:
nuke: exit -9 (641.56 seconds) /home/knordback/Kurt/pudl> bash -c 'coverage run --append src/pudl/cli.py --logfile tox-nuke.log --clobber src/pudl/package_data/settings/etl_full.yml' pid=793582
.pkg: _exit> python /home/knordback/.conda/envs/pudl-dev/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta
nuke: FAIL code -9 (7080.40=setup[98.70]+cmd[0.12,0.54,0.48,0.39,0.41,0.42,0.39,0.53,0.24,6.33,6.21,1.24,0.03,1.72,1.23,161.77,33.89,25.29,3980.21,0.01,2118.71,641.56] seconds)
evaluation failed :( (7081.04 seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of those tests run successfully. So does
tox
with no arguments (which I thought was the commit criterion). So it seems to be something specific to the "nuke".I am indeed getting copious output, but it's not (to my eye) indicating what the problem is. As mentioned above, I get some warnings but nothing obviously connected to this.
I'll be away next week. I'll will look into this more when I get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Zane. I'm back, for a little while, and have some time to work on this again. I can continue un-dropping fields, but I'm not sure how worried to be about not being able to run
tox -e nuke
successfully. As noted, the other tests run fine for me. Output is attached in case you want to take a look.tox-out.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried running the full ETL with the
pudl_etl
command line tool? The place that it's failing in your logs is the most memory intensive part of the ETL so my guess is it's running out of memory and crashing? I thought you'd been able to do the full ETL previously? How much memory + swap do you have? I think it may take up to 25GB as it is now.tox
with no arguments is what gets run in CI on GitHub. It only processes 1 year of data. For changes like the one you're making that will affect all years of data we try and run the full ETL locally before it gets merged intodev
.tox -e nuke
is a blunt instrument that not only runs the full ETL, but also all the unit + integration tests on all the data + data validation tests. You probably want to be trying to run the full ETL, but only for the EIA data, to check whether everything is working. Thedevtools/eia-etl-debug.ipynb
notebook is the easiest / most efficient way to do that right now, since you don't have to run the extract step over and over again. It'll also avoid this (FERC 1) related memory intensive step.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 16GB ram + 15GB swap. I thought I could run the full ETL but apparently not that either. I was able to run
devtools/eia-etl-debug.ipynb
(first time using Jupyter -- it's kinda slick).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jupyter is great for working with data. Lots more human-friendly tools for seeing what's going on in there! Definitely worth getting familiar with.