984 ecocounter pull recent outages #1014

gabrielwol · 2024-07-17T14:48:49Z

What this pull request accomplishes:

Add a new Ecocounter view to identify outages within the last 30 days
- adds a date_decommissioned column to ecocounter.flows to ensure we don't keep trying to pull data for decommissioned sites. Also populated column for anything that hadn't reported this year.
add a new pull task to Ecocounter DAG to attempt to pull data for outages in the last 30 days
minor refactor of Ecocounter functions with get_connections and truncate_and_insert to reduce repetition

Issue(s) this solves:

Closes Ecocounter capture backlogged data #984

What, in particular, needs to reviewed:

I decided against the mapped task method with one task per flow + daily retry because that would have kept some DAG runs running for weeks at a time (if they were decommissioned or had a long outage). With this approach we try to pull data for recent outages each day in a separate task. Thoughts? (here is that other approach if you're curious)

What needs to be done by a sysadmin after this PR is merged

E.g.: these tables need to be migrated/created in the production schema.

volumes/ecocounter/views/create-view-recent-outages.sql

chmnata · 2024-07-17T20:15:45Z

volumes/ecocounter/views/create-view-recent-outages.sql

@@ -0,0 +1,73 @@
+CREATE OR REPLACE VIEW ecocounter.recent_outages AS (


can we rename to sth like last_week_outages or sth similar?

even more generic: identify_outages

chmnata · 2024-07-17T20:49:41Z

I agree with deciding against the mapped task + daily retry, limiting the number of active dag runs is would be nice. I guess my only concern is outages that happened a month ago (since the recent outages only includes sites had outages in the last month). Do we see that happening often? Would we need to pull site data that were a month ago?

Thought about instead of having recent_outage we just keep a list of all outages but that seems like too much, and what if they were legit outages where we won't be able to retrieve any data (unless we add them to AR and filters out sites that way, but still kinda a lot) 🤔. We could do mapped tasks with 1 day retry and after that have alerts for us to manually retry when we want? Or simply increase recent outages to more than a month. Just wanted to make sure we are not missing any repulling possibilities with automatic repulling, since we don't have alerts on daily outages.

gabrielwol · 2024-07-18T18:22:32Z

I changed the view to a function so we can alter the number of days easily.
When we merge the PR I'll also do a run with num_days = 1 year so we can be sure we haven't missed any recent data.

As for keeping track of all the outages, I think it will need to be a separate issue: I realized for this purpose we should only be re-pulling when SUM(volume) IS NULL.
SUM(volume) = 0 also happens but wouldn't be solved by repulling.

…om/CityofToronto/bdit_data-sources into 984-ecocounter-pull-recent-outages

gabrielwol · 2024-07-26T18:52:39Z

This change is working nicely. You can see in todays logs we captured 5 days (5 days x 15 minute bins = 480 records) for two detectors which were late reporting. Confirmed by looking at Ecocounter dashboard:

(the most recent day is pulled by the regular pull, while the 5 day backlog is pulled by pull_recent_outages)

chmnata

Thanks for the changes! Making the view into a function does increase flexibility. As we discussed, since most outages can be repulled within 2 weeks, the outage repull limit of 60 days should be sufficient for meow. We can revisit if and when there are cases that exceed the 60 days limit and if they are frequent enough that manual repulling is annoying.

One last thing, can we update the readme to include the new pull_recent_outages task?

gabrielwol · 2024-07-30T21:15:50Z

Readme updated!

chmnata · 2024-07-30T21:19:36Z

volumes/ecocounter/readme.md

-    - [`check_partitions` TaskGroup](#check_partitions-taskgroup)
-    - [`data_checks` TaskGroup](#data_checks-taskgroup)
-  - [`ecocounter_check` DAG](#ecocounter_check-dag)
+    - [Discontinuities](#discontinuities)


somehow formatted as code block

Fixed! And added back a missing section of readme 👀

chmnata

:gabe-approved:

gabrielwol added 4 commits July 16, 2024 21:32

#984 create view recent-outages

f7f5aae

#984 add task to poll all the recent outages

34146c1

#984 add date_decommissioned col

fd73d73

#984 add truncate_and_insert to simplify dag

baaadce

gabrielwol added enhancement Ecocounter labels Jul 17, 2024

gabrielwol requested a review from chmnata July 17, 2024 14:48

gabrielwol self-assigned this Jul 17, 2024

gabrielwol linked an issue Jul 17, 2024 that may be closed by this pull request

Ecocounter capture backlogged data #984

Closed

#984 change to 30 day lookback + fluff

bc707ea

chmnata reviewed Jul 17, 2024

View reviewed changes

volumes/ecocounter/views/create-view-recent-outages.sql Outdated Show resolved Hide resolved

chmnata reviewed Jul 17, 2024

View reviewed changes

gabrielwol added 2 commits July 18, 2024 18:16

#984 change view to function and up limit to 60 day lookback

b7669ce

#984 cast to date

c355ce3

gabrielwol added 6 commits July 19, 2024 13:50

#984 add date_decommissioned to sites table

b610141

#984 add aliases to fix collision with return table columns

2912a13

#984 remove timespec

cd08822

#984 change dates to times in outage function

de22cd8

Merge branch '984-ecocounter-pull-recent-outages' of https://github.c…

eb19ab2

…om/CityofToronto/bdit_data-sources into 984-ecocounter-pull-recent-outages

#984 undo timespec change

2f4c419

chmnata requested changes Jul 30, 2024

View reviewed changes

#984 update ecocounter readme

b93778d

chmnata reviewed Jul 30, 2024

View reviewed changes

#984 replace missing readme sections

311c130

chmnata approved these changes Jul 31, 2024

View reviewed changes

gabrielwol merged commit b3ca04c into master Jul 31, 2024
5 of 6 checks passed

gabrielwol deleted the 984-ecocounter-pull-recent-outages branch July 31, 2024 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

984 ecocounter pull recent outages #1014

984 ecocounter pull recent outages #1014

gabrielwol commented Jul 17, 2024 •

edited

Loading

chmnata Jul 17, 2024

chmnata Jul 17, 2024

gabrielwol Jul 18, 2024

chmnata commented Jul 17, 2024

gabrielwol commented Jul 18, 2024

gabrielwol commented Jul 26, 2024

chmnata left a comment

gabrielwol commented Jul 30, 2024

chmnata Jul 30, 2024

gabrielwol Jul 31, 2024

chmnata left a comment

		@@ -0,0 +1,73 @@
		CREATE OR REPLACE VIEW ecocounter.recent_outages AS (

984 ecocounter pull recent outages #1014

984 ecocounter pull recent outages #1014

Conversation

gabrielwol commented Jul 17, 2024 • edited Loading

What this pull request accomplishes:

Issue(s) this solves:

What, in particular, needs to reviewed:

What needs to be done by a sysadmin after this PR is merged

chmnata Jul 17, 2024

Choose a reason for hiding this comment

chmnata Jul 17, 2024

Choose a reason for hiding this comment

gabrielwol Jul 18, 2024

Choose a reason for hiding this comment

chmnata commented Jul 17, 2024

gabrielwol commented Jul 18, 2024

gabrielwol commented Jul 26, 2024

chmnata left a comment

Choose a reason for hiding this comment

gabrielwol commented Jul 30, 2024

chmnata Jul 30, 2024

Choose a reason for hiding this comment

gabrielwol Jul 31, 2024

Choose a reason for hiding this comment

chmnata left a comment

Choose a reason for hiding this comment

gabrielwol commented Jul 17, 2024 •

edited

Loading