Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

984 ecocounter pull recent outages #1014

Merged
merged 15 commits into from
Jul 31, 2024

Conversation

gabrielwol
Copy link
Collaborator

@gabrielwol gabrielwol commented Jul 17, 2024

What this pull request accomplishes:

  • Add a new Ecocounter view to identify outages within the last 30 days
    • adds a date_decommissioned column to ecocounter.flows to ensure we don't keep trying to pull data for decommissioned sites. Also populated column for anything that hadn't reported this year.
  • add a new pull task to Ecocounter DAG to attempt to pull data for outages in the last 30 days
  • minor refactor of Ecocounter functions with get_connections and truncate_and_insert to reduce repetition

Issue(s) this solves:

What, in particular, needs to reviewed:

  • I decided against the mapped task method with one task per flow + daily retry because that would have kept some DAG runs running for weeks at a time (if they were decommissioned or had a long outage). With this approach we try to pull data for recent outages each day in a separate task. Thoughts? (here is that other approach if you're curious)

What needs to be done by a sysadmin after this PR is merged

E.g.: these tables need to be migrated/created in the production schema.

@gabrielwol gabrielwol requested a review from chmnata July 17, 2024 14:48
@gabrielwol gabrielwol self-assigned this Jul 17, 2024
@gabrielwol gabrielwol linked an issue Jul 17, 2024 that may be closed by this pull request
@@ -0,0 +1,73 @@
CREATE OR REPLACE VIEW ecocounter.recent_outages AS (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename to sth like last_week_outages or sth similar?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

month**

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even more generic: identify_outages

@chmnata
Copy link
Collaborator

chmnata commented Jul 17, 2024

I agree with deciding against the mapped task + daily retry, limiting the number of active dag runs is would be nice. I guess my only concern is outages that happened a month ago (since the recent outages only includes sites had outages in the last month). Do we see that happening often? Would we need to pull site data that were a month ago?

Thought about instead of having recent_outage we just keep a list of all outages but that seems like too much, and what if they were legit outages where we won't be able to retrieve any data (unless we add them to AR and filters out sites that way, but still kinda a lot) 🤔. We could do mapped tasks with 1 day retry and after that have alerts for us to manually retry when we want? Or simply increase recent outages to more than a month. Just wanted to make sure we are not missing any repulling possibilities with automatic repulling, since we don't have alerts on daily outages.

@gabrielwol
Copy link
Collaborator Author

I changed the view to a function so we can alter the number of days easily.
When we merge the PR I'll also do a run with num_days = 1 year so we can be sure we haven't missed any recent data.

As for keeping track of all the outages, I think it will need to be a separate issue: I realized for this purpose we should only be re-pulling when SUM(volume) IS NULL.
SUM(volume) = 0 also happens but wouldn't be solved by repulling.

@gabrielwol
Copy link
Collaborator Author

This change is working nicely. You can see in todays logs we captured 5 days (5 days x 15 minute bins = 480 records) for two detectors which were late reporting. Confirmed by looking at Ecocounter dashboard:
image

(the most recent day is pulled by the regular pull, while the 5 day backlog is pulled by pull_recent_outages)

Copy link
Collaborator

@chmnata chmnata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Making the view into a function does increase flexibility. As we discussed, since most outages can be repulled within 2 weeks, the outage repull limit of 60 days should be sufficient for meow. We can revisit if and when there are cases that exceed the 60 days limit and if they are frequent enough that manual repulling is annoying.

One last thing, can we update the readme to include the new pull_recent_outages task?

@gabrielwol
Copy link
Collaborator Author

Readme updated!

- [`check_partitions` TaskGroup](#check_partitions-taskgroup)
- [`data_checks` TaskGroup](#data_checks-taskgroup)
- [`ecocounter_check` DAG](#ecocounter_check-dag)
- [Discontinuities](#discontinuities)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow formatted as code block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! And added back a missing section of readme 👀

Copy link
Collaborator

@chmnata chmnata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:gabe-approved:

@gabrielwol gabrielwol merged commit b3ca04c into master Jul 31, 2024
5 of 6 checks passed
@gabrielwol gabrielwol deleted the 984-ecocounter-pull-recent-outages branch July 31, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ecocounter capture backlogged data
2 participants