Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP improvements to automated archive runs #357

Merged
merged 35 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
7ca2750
Add validation failures to slackbot
e-belfer Jun 17, 2024
13302d9
Fix skipping failed tests
e-belfer Jun 17, 2024
9425dd5
Add format_message and reduce duplicated code
e-belfer Jun 17, 2024
4cf7715
Merge branch 'main' into add-failures-to-slackbot
e-belfer Jun 17, 2024
9610964
Test by running on some busted and non-busted archives
e-belfer Jun 17, 2024
bf58f1c
Fix issue in test run-archiver.yml
e-belfer Jun 17, 2024
68c6edf
Shrink test and flatten validation test lists
e-belfer Jun 18, 2024
83961e3
Update issue template, add template creation to workflow
e-belfer Jun 18, 2024
60a76c5
Fix workflow format
e-belfer Jun 18, 2024
dd270b4
Fix link formatting
e-belfer Jun 18, 2024
a2cb688
Make slack validation failures more succinct
e-belfer Jun 18, 2024
f304b4e
Attempt to add dataset selection in manual run
e-belfer Jun 18, 2024
ce9158e
Try to fix inputs
e-belfer Jun 18, 2024
7cb879a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 18, 2024
3d31147
Try to fix matrix strategy
e-belfer Jun 18, 2024
53c12d4
Try to fix matrix strategy
e-belfer Jun 18, 2024
6242c47
Try to fix matrix strategy
e-belfer Jun 18, 2024
8bbdf9c
Fix syntax
e-belfer Jun 18, 2024
6094cb4
Test syntax and aditional quotes
e-belfer Jun 18, 2024
a7622b0
Remove epacems from large, try to get filtering to work
e-belfer Jun 18, 2024
ec4eea9
Add back large runner
e-belfer Jun 18, 2024
ad7464c
Remove epacems from default small runner list
e-belfer Jun 18, 2024
5883670
Fix github issue creation
e-belfer Jun 18, 2024
71bd627
Deal with foolish boolean formats
e-belfer Jun 18, 2024
4bf7d5c
Appease the GHA formatting nightmare
e-belfer Jun 18, 2024
a1301a0
More playing around with github issue creation
e-belfer Jun 18, 2024
4829753
Even more tooling with github issue creation
e-belfer Jun 18, 2024
0d51e24
Just try everything
e-belfer Jun 18, 2024
0e015c6
Try different tack for boolean
e-belfer Jun 18, 2024
6fdd347
Try different tack for boolean
e-belfer Jun 18, 2024
1fe8ba0
Try false instead of false
e-belfer Jun 18, 2024
f179acc
Handle skips and irrational GHA format requirements
e-belfer Jun 18, 2024
02d0eb6
Make scheduled run workflow more explicit, remove redundant logs in i…
e-belfer Jun 19, 2024
844ecc2
Workflow dispatch doesn't like env variables as input
e-belfer Jun 19, 2024
140473c
Roll back env vars due to difficult GHA behavior
e-belfer Jun 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .github/ISSUE_TEMPLATE/monthly-archive-update.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
---
name: Monthly archive update
about: Template for publishing monthly archives.
title: Publish archives for the month of MONTH
title: Publish {{ date | date('MMMM Do YYYY') }} archives
labels: automation, zenodo
assignees: ''
assignees: e-belfer
jdangerx marked this conversation as resolved.
Show resolved Hide resolved

---

# Summary of results:
See the job run results [here]({{ env.RUN_URL }}).

# Review and publish archives

For each of the following archives, find the run status in the Github archiver run. If validation tests pass, manually review the archive and publish. If no changes detected, delete the draft. If changes are detected, manually review the archive following the guidelines in step 3 of `README.md`, then publish the new version. Then check the box here to confirm publication status, adding a note on the status (e.g., "v1 published", "no changes detected, draft deleted"):
Expand Down Expand Up @@ -50,8 +53,8 @@ If the validation failure is blocking (e.g., file format incorrect, whole datase
For each run that failed because of another reason (e.g., underlying data changes, code failures), create an issue describing the failure and take necessary steps to resolve it.

```[tasklist]
- [ ]
- [ ] dataset
```

# Relevant logs
[Link to logs from GHA run]( PLEASE FIND THE ACTUAL LINK AND FILL IN HERE )
[Link to logs from GHA run]({{ env.RUN_URL }})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: we could maybe strip this whole section if there's that summary of results section above.

57 changes: 34 additions & 23 deletions .github/workflows/run-archiver.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,22 @@ name: run-archiver

on:
workflow_dispatch:
inputs:
small_runner:
jdangerx marked this conversation as resolved.
Show resolved Hide resolved
description: 'Small runner: Comma-separated list of datasets to archive (e.g., "ferc2","ferc6").'
default: '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"'
jdangerx marked this conversation as resolved.
Show resolved Hide resolved
required: true
type: string
large_runner:
description: "Kick off large runners (for epacems)?"
required: true
default: false
type: boolean
create_github_issue:
description: "Create a Github issue from this run?"
default: false
required: true
type: boolean
schedule:
- cron: "21 8 1 * *" # 8:21 AM UTC, first of every month

Expand All @@ -13,28 +29,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
dataset:
- eia176
- eia191
- eia757a
- eia860
- eia861
- eia860m
- eia923
- eia930
- eiaaeo
- eiawater
- eia_bulk_elec
- epacamd_eia
- ferc1
- ferc2
- ferc6
- ferc60
- ferc714
- mshamines
- nrelatb
- phmsagas

dataset: ${{ fromJSON(format('[{0}]', inputs.small_runner || '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"')) }}
Copy link
Member Author

@e-belfer e-belfer Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scheduled, should default to the full list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: What do you think of defining one list of "all the damn datasets" in env so we can access that everywhere we need to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky here because there should in fact be two variables - "small runner datasets" and "large runner datasets". The alternative is one variable with some kind of filtering, but I couldn't figure out how to do that neatly. But I can make a "small" and "large" dataset list.

fail-fast: false
runs-on: ubuntu-latest
steps:
Expand Down Expand Up @@ -78,6 +73,7 @@ jobs:
path: ${{ matrix.dataset }}_run_summary.json

archive-run-large:
if: inputs.large_runner
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set as true in workflow dispatch, or triggered by scheduled run this should run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if triggered by scheduled run, I would expect inputs.large_runner be empty and thus archive-run-large to get skipped - am I missing something here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that an empty string would actually get evaluated as true, but I could be totally off-base here. I think your suggestion re: incorporating the type of run below is great and I'll incorporate it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an unset variable here would get treated as a "falsey" value: https://docs.github.com/en/actions/learn-github-actions/expressions#literals

Note that in conditionals, falsy values (false, 0, -0, "", '', null) are coerced to false and truthy (true and other non-falsy values) are coerced to true.

And actually I bet unset variable is actually null instead of '', now that I look at those docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, making this more explicit seems wise.

defaults:
run:
shell: bash -l {0}
Expand All @@ -91,7 +87,6 @@ jobs:
labels: ubuntu-22.04-4core
steps:
- uses: actions/checkout@v4

- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
Expand Down Expand Up @@ -160,3 +155,19 @@ jobs:
payload: ${{ steps.all_summaries.outputs.SLACK_PAYLOAD }}
env:
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}

make-github-issue:
if: always() && inputs.create_github_issue != false
Copy link
Member Author

@e-belfer e-belfer Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion here for use of always() actions/runner#491
Completely unhinged GHA behavior, but what can we do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whee! hey, if it works it works.

Is the idea behind inputs.create_github_issue != false that "" != false and then we will get the github issue in scheduled runs as well as the specified manual runs?

If so, what do you think of using github.event_name to differentiate between workflow_dispatch and scheduled runs? That is more explicitly "do X step if it's scheduled or if there's some specific workflow_dispatch input."

Copy link
Member Author

@e-belfer e-belfer Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great idea, can incorporate it here. But yes, that was my original idea.

runs-on: ubuntu-latest
needs:
- archive-run-small
- archive-run-large
steps:
- uses: actions/checkout@v3
- name: Create an issue
uses: JasonEtco/[email protected]
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
with:
filename: .github/ISSUE_TEMPLATE/monthly-archive-update.md
85 changes: 64 additions & 21 deletions scripts/make_slack_notification_message.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,41 +29,81 @@ def _parse_args():
return parser.parse_args()


def _format_message(
url: str, name: str, content: str, max_len: int = 3000
) -> list[dict]:
text = f"<{url}|*{name}*>\n{content}"[:max_len]
return [
{
"type": "section",
"text": {"type": "mrkdwn", "text": text},
},
]


def _format_failures(summary: dict) -> list[dict]:
name = summary["dataset_name"]
url = summary["record_url"]

test_failures = defaultdict(list)
for validation_test in summary["validation_tests"]:
if (not validation_test["success"]) and (
validation_test["required_for_run_success"]
):
test_failures = ". ".join(
jdangerx marked this conversation as resolved.
Show resolved Hide resolved
[validation_test["name"], ". ".join(validation_test["notes"])]
) # Flatten list of lists

if test_failures:
failures = f"```\n{json.dumps(test_failures, indent=2)}\n```"
else:
return None

return _format_message(url=url, name=name, content=failures)


def _format_summary(summary: dict) -> list[dict]:
name = summary["dataset_name"]
url = summary["record_url"]
if any(not test["success"] for test in summary["validation_tests"]):
return None # Don't report on file changes if any test failed.

if file_changes := summary["file_changes"]:
abridged_changes = defaultdict(list)
for change in file_changes:
abridged_changes[change["diff_type"]].append(change["name"])
changes = f"```\n{json.dumps(abridged_changes, indent=2)}\n```"
else:
changes = "No changes."

return _format_message(url=url, name=name, content=changes)


def main(summary_files: list[Path]) -> None:
"""Format summary files for Slack perusal."""
summaries = []
for summary_file in summary_files:
with summary_file.open() as f:
summaries.extend(json.loads(f.read()))

def format_summary(summary: dict) -> list[dict]:
name = summary["dataset_name"]
url = summary["record_url"]
if file_changes := summary["file_changes"]:
abridged_changes = defaultdict(list)
for change in file_changes:
abridged_changes[change["diff_type"]].append(change["name"])
changes = f"```\n{json.dumps(abridged_changes, indent=2)}\n```"
else:
changes = "No changes."

max_len = 3000
text = f"<{url}|*{name}*>\n{changes}"[:max_len]
return [
{
"type": "section",
"text": {"type": "mrkdwn", "text": text},
},
]
failed_blocks = list(
itertools.chain.from_iterable(
_format_failures(s) for s in summaries if _format_failures(s) is not None
)
)

unchanged_blocks = list(
itertools.chain.from_iterable(
format_summary(s) for s in summaries if not s["file_changes"]
_format_summary(s)
for s in summaries
if (not s["file_changes"]) and (_format_summary(s) is not None)
)
)
changed_blocks = list(
itertools.chain.from_iterable(
format_summary(s) for s in summaries if s["file_changes"]
_format_summary(s)
for s in summaries
if (s["file_changes"]) and (_format_summary(s) is not None)
)
)

Expand All @@ -73,6 +113,8 @@ def header_block(text: str) -> dict:
def section_block(text: str) -> dict:
return {"type": "section", "text": {"type": "mrkdwn", "text": text}}

if failed_blocks:
failed_blocks = [section_block("*Validation Failures*")] + failed_blocks
if changed_blocks:
changed_blocks = [section_block("*Changed*")] + changed_blocks
if unchanged_blocks:
Expand All @@ -84,6 +126,7 @@ def section_block(text: str) -> dict:
"attachments": [
{
"blocks": [header_block("Archiver Run Outcomes")]
+ failed_blocks
+ changed_blocks
+ unchanged_blocks,
}
Expand Down
Loading