-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge dev into main for 2023-11-09 #3031
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
zaneselvans
commented
Nov 9, 2023
- First pass of integrating the monthly EIA923 data into the rest of the EIA data. This includes updating the package data to account for the 2023 year and updating the way to assign data maturities to 923 data. This also updates some of the expected row counts for the data. It should still fail on the gen_eia923 table because the row count was going down which doesn't seem right. There are also some failures related to check_date_freq as there are now less than 12 months expected in a given round of updates. Will handle those errors in another commit.
- remove breakpoint
- Add function to drop ytd records for annual tables
- Adjust monthly row expectations for gf and frc tables after dropping ytd values for annual tables
- Tweak the way we add data maturity to the eia923 monthly files and remove double returns from the drop_ytd_for_annual_tables function
- Litle updates: - Add a note about how the plants are getting dropped in the gen_eia923 output table and link to the issue.
- For now, comment out the checks that make sure we have the same years of EIA923 and EIA860 data. This is causing issues for the monthly EIA923 data that gets integrated ahead of any available 860 data. This might cause issues elsewhere which is why I haven't committed to fully deleting it yet.
- Update min max rows
- Add data_maturity field to harvested EIA tables so that we can drop ytd records from annual EIA tables
- Address PR comments: - Restructure the way that the data_maturity field is dropped from certain tables when merging multiple tables together that each have that field. Previously it was ad-hoc, now it just gets dropped in the denorm_by_plant function.
- Fix release note trailing whitespace error
- Update test_eia923_dependency function to make sure some 860 and 923 years overlap but don't need to be the same
- Only generate alphanumeric entity IDs in test - non-printable characters seem to break groupby. (Only generate alphanumeric entity IDs in test #2993)
- Set up Cloud SQL postgres database for dagster storage
- Copy dagster.yaml after DAGSTER_HOME is created
- Add proper quoting rules to DAGSTER_PG_PASSWORD secret
- Use max cpus for nightly builds and spin dagster-storage SQL instance up and down
- Create and delete Cloud SQL db during nightly builds
- Set PUDL_SETTINGS_YML to etl_full.yml and add git sha to Cloud SQL database name
- Add short github ref to database name
- Update DAGSTER_PG_DB with short git sha
- Update date range for nightly build links to include 2022
- Update 923 settings files to accomodate 2023 data and update settings tests so that they aren't dependent on having the same years of EIA923 and EIA860 data
- Fix calculating the report_date in demand_hourly_pa_ferc714
- Require non-null report_date in FERC 714 hourly demand table.
- Update date validation function to only look at instances where data_maturity is not ytd_incremental
- Remove Cloud SQL lifecycle management from gcp_pudl_etl.sh script
- Update data contributors, add zenodo role and doi field, update US copyright link
- Update to ZenodoDoi class, update to https
- Remove leftover string
- Switch regex strategy to sampling strategy to improve performance (Switch regex strategy to sampling strategy to improve performance #2998)
- add alembic schema changes for the recent constraint.
- only fix a reporting_frequency_code when the column exists
- Update responses requirement from <0.24,>=0.14 to >=0.14,<0.25
- Update pyarrow requirement from <14,>=13 to >=13,<15
- Update dagster-postgres requirement
- [pre-commit.ci] pre-commit autoupdate
- update tox and eia923 rows
- update excepted rows for no-fips id-ed respondents but keep annualized demand
- add report year validation test
- add minmax rows into validation test for chonky table
- idk exactly why the "nan"s began existing but this fixes it
- revert the replace of "nan" by stopping introducing them! plus some light clean up
- REALLY REALLY its a nullable string
…e EIA data. This includes updating the package data to account for the 2023 year and updating the way to assign data maturities to 923 data. This also updates some of the expected row counts for the data. It should still fail on the gen_eia923 table because the row count was going down which doesn't seem right. There are also some failures related to check_date_freq as there are now less than 12 months expected in a given round of updates. Will handle those errors in another commit.
…ytd values for annual tables
…move double returns from the drop_ytd_for_annual_tables function
- Add a note about how the plants are getting dropped in the gen_eia923 output table and link to the issue. - Update the way we tell whether an EIA923 filing is monthly or annual based on feedback in the PR
… of EIA923 and EIA860 data. This is causing issues for the monthly EIA923 data that gets integrated ahead of any available 860 data. This might cause issues elsewhere which is why I haven't committed to fully deleting it yet.
…td records from annual EIA tables
- Restructure the way that the data_maturity field is dropped from certain tables when merging multiple tables together that each have that field. Previously it was ad-hoc, now it just gets dropped in the denorm_by_plant function. - This also entails changing how the data_maturity field gets passed through to the agg tables: adds the data_maturity field to the agg function, selecting the 'first' instance of the data_maturity per agg because the fields are aggregated by date which is how data_maturity is determined. The annual aggregations drop the ytd rows before the aggregation happens so taking the first data_maturity value per year works in this case. - Remove some comment fields - Add new migration
…years overlap but don't need to be the same
…ers seem to break groupby. (#2993)
… tests so that they aren't dependent on having the same years of EIA923 and EIA860 data
…maturity is not ytd_incremental
Update sources, DOI and copyright link in PUDL
add alembic migration for the report_date non-null constraint that was recently added
…gres Set up Cloud SQL Postgres database for dagster storage
Updates the requirements on [responses](https://github.com/getsentry/responses) to permit the latest version. - [Release notes](https://github.com/getsentry/responses/releases) - [Changelog](https://github.com/getsentry/responses/blob/master/CHANGES) - [Commits](getsentry/responses@0.14.0...0.24.0) --- updated-dependencies: - dependency-name: responses dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
Updates the requirements on [pyarrow](https://github.com/apache/arrow) to permit the latest version. - [Commits](apache/arrow@go/v13.0.0...go/v14.0.0) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
Updates the requirements on [dagster-postgres](https://github.com/dagster-io/dagster) to permit the latest version. - [Release notes](https://github.com/dagster-io/dagster/releases) - [Changelog](https://github.com/dagster-io/dagster/blob/master/CHANGES.md) - [Commits](https://github.com/dagster-io/dagster/commits) --- updated-dependencies: - dependency-name: dagster-postgres dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
…/responses-gte-0.14-and-lt-0.25 Update responses requirement from <0.24,>=0.14 to >=0.14,<0.25
…/pyarrow-gte-13-and-lt-15 Update pyarrow requirement from <14,>=13 to >=13,<15
…/dagster-postgres-gte-0.21.5-and-lt-0.21.7 Update dagster-postgres requirement from <0.21.6,>=0.21.5 to >=0.21.5,<0.21.7
updates: - [github.com/astral-sh/ruff-pre-commit: v0.1.3 → v0.1.4](astral-sh/ruff-pre-commit@v0.1.3...v0.1.4)
…te-config [pre-commit.ci] pre-commit autoupdate
…uency_code only fix a reporting_frequency_code when the column exists
…te_fix update excepted rows for no-fips id-ed respondents but keep annualize…
…at_nan Fix validation `test_fbp_ferc1_mismatched_fuels` error
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3031 +/- ##
=====================================
Coverage 88.6% 88.7%
=====================================
Files 91 91
Lines 10991 11011 +20
=====================================
+ Hits 9749 9769 +20
Misses 1242 1242
☔ View full report in Codecov by Sentry. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.