-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
984 ecocounter pull recent outages #1014
Conversation
@@ -0,0 +1,73 @@ | |||
CREATE OR REPLACE VIEW ecocounter.recent_outages AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename to sth like last_week_outages
or sth similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
month**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even more generic: identify_outages
I agree with deciding against the mapped task + daily retry, limiting the number of active dag runs is would be nice. I guess my only concern is outages that happened a month ago (since the recent outages only includes sites had outages in the last month). Do we see that happening often? Would we need to pull site data that were a month ago? Thought about instead of having recent_outage we just keep a list of all outages but that seems like too much, and what if they were legit outages where we won't be able to retrieve any data (unless we add them to AR and filters out sites that way, but still kinda a lot) 🤔. We could do mapped tasks with 1 day retry and after that have alerts for us to manually retry when we want? Or simply increase recent outages to more than a month. Just wanted to make sure we are not missing any repulling possibilities with automatic repulling, since we don't have alerts on daily outages. |
I changed the view to a function so we can alter the number of days easily. As for keeping track of all the outages, I think it will need to be a separate issue: I realized for this purpose we should only be re-pulling when |
…om/CityofToronto/bdit_data-sources into 984-ecocounter-pull-recent-outages
This change is working nicely. You can see in todays logs we captured 5 days (5 days x 15 minute bins = 480 records) for two detectors which were late reporting. Confirmed by looking at Ecocounter dashboard: (the most recent day is pulled by the regular pull, while the 5 day backlog is pulled by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Making the view into a function does increase flexibility. As we discussed, since most outages can be repulled within 2 weeks, the outage repull limit of 60 days should be sufficient for meow. We can revisit if and when there are cases that exceed the 60 days limit and if they are frequent enough that manual repulling is annoying.
One last thing, can we update the readme to include the new pull_recent_outages
task?
Readme updated! |
volumes/ecocounter/readme.md
Outdated
- [`check_partitions` TaskGroup](#check_partitions-taskgroup) | ||
- [`data_checks` TaskGroup](#data_checks-taskgroup) | ||
- [`ecocounter_check` DAG](#ecocounter_check-dag) | ||
- [Discontinuities](#discontinuities) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow formatted as code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! And added back a missing section of readme 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:gabe-approved:
What this pull request accomplishes:
date_decommissioned
column toecocounter.flows
to ensure we don't keep trying to pull data for decommissioned sites. Also populated column for anything that hadn't reported this year.get_connections
andtruncate_and_insert
to reduce repetitionIssue(s) this solves:
What, in particular, needs to reviewed:
What needs to be done by a sysadmin after this PR is merged
E.g.: these tables need to be migrated/created in the production schema.