perf(Predictions.PubSub): Read stops + routes from global cache instead of API #206

KaylaBrady · 2024-09-25T12:53:42Z

Summary

Ticket: Predictions Scalability: new channel that publishes predictions updates in chunks

What is this PR for?

This PR incorporates the global cache data added in #200 since we found those API calls to be a main source of latency in load testing (notes).

github-actions · 2024-09-25T12:55:04Z

Coverage of commit `4781abb`

Summary coverage rate:
  lines......: 79.4% (1341 of 1689 lines)
  functions..: 70.1% (564 of 804 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|93.3%    15|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |79.1%     43|66.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

KaylaBrady · 2024-09-25T13:08:47Z

Load testing against this I very quickly started to get errors joining the channel:

87955d4a5a6b 13:02:37.701 [error] GenServer #PID<0.8821.0> terminating
87955d4a5a6b ** (MatchError) no match of right hand side value: {:error, %Req.TransportError{reason: :timeout}}
87955d4a5a6b (mobile_app_backend 0.1.0) lib/mobile_app_backend/global_data_cache.ex:194: MobileAppBackend.GlobalDataCache.Impl.fetch_route_patterns/0
87955d4a5a6b (mobile_app_backend 0.1.0) lib/mobile_app_backend/global_data_cache.ex:157: MobileAppBackend.GlobalDataCache.Impl.update_data/1
87955d4a5a6b (mobile_app_backend 0.1.0) lib/mobile_app_backend/predictions/pub_sub.ex:70: MobileAppBackend.Predictions.PubSub.subscribe_for_stops/1

I think we will need to either populate the cache on startup or have a fallback mechanism.

github-actions · 2024-09-25T13:19:49Z

Coverage of commit `b18f7cd`

Summary coverage rate:
  lines......: 79.4% (1341 of 1689 lines)
  functions..: 70.1% (564 of 804 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|93.3%    15|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |79.1%     43|66.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

github-actions · 2024-09-25T14:02:17Z

Coverage of commit `011cfda`

Summary coverage rate:
  lines......: 79.3% (1342 of 1692 lines)
  functions..: 70.1% (564 of 804 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|93.3%    15|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |76.1%     46|66.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

KaylaBrady · 2024-09-25T14:11:00Z

Added a call to recalculate as part of init, seeing splunk error though.
received unexpected message in handle_info/2: :recalculate

Potentially we should move checking for the presence of the global data into the health check endpoint so that traffic only shifts to an instance once it has global data.

boringcactus

I think being able to use different keys and having a Mox mock are two different ways to solve the problem of providing test data from the GlobalDataCache to consumers, and I'm not quite sure if we need both of them, but I haven't finished thinking through which one would be enough.

If we report that instances are only healthy when they have global data, that'll cause smoke testing the Docker container in CI to fail again, unless we give the Docker container smoke test an API key, which might actually be the correct fix anyway, and I think that would let us actually just fetch the data eagerly.

lib/mobile_app_backend/global_data_cache.ex

boringcactus · 2024-09-25T22:08:24Z

lib/mobile_app_backend/global_data_cache.ex

    update_data(state.key)

-    Process.send_after(self(), :recalculate, state.update_ms)
+    if :persistent_term.get(state.key, nil) do


update_data will only not call :persistent_term.put/2 if it crashes, so I don't think I understand when this check could fail.

boringcactus · 2024-09-25T22:11:14Z

lib/mobile_app_backend/global_data_cache.ex


    state = %State{
      key: opts[:key],
      update_ms: opts[:update_ms] || :timer.minutes(5)
    }

+    Process.send_after(self(), :recalculate, :timer.seconds(1))


It's weird that this wasn't here before. This is really tough to test locally, since you can only know it's working if GTFS actually changes, but that might mean this never actually worked and was only ever calculating the global data once. Oops!

Added some light tests for this by checking that the message is sent

boringcactus · 2024-09-25T22:12:55Z

lib/mobile_app_backend/global_data_cache.ex

  end

-  @spec get_data(key()) :: data()
+  @impl true
  def get_data(key \\ default_key()) do
    :persistent_term.get(key, nil) || update_data(key)


Maybe this update_data should be moved into a GenServer.call or something so that if there are a dozen simultaneous calls to get_data/1 before data is loaded they don't each call update_data/1.

I'd be somewhat worried about putting it into GenServer.call since if it is slow for the first user, subsequent user requests will all fail too.

I think the best bet is making sure that the data is populated first & removing the call to update_data from get_data.

Doing that asynchronously via the scheduled checks & preventing user traffic via a healthcheck seems like the safest approach for that to me - if the global data can't load immediately for some reason, it seems cleaner to continue trying to re-fetch without crashing. Maybe I'm overly wary of crashes though. In any case, I think resolving that mechanism could be part of a separate PR so that the polling is in place

@boringcactus I'm going to break this PR up into separate ones so that we can clear out the immediate problem of setting up the timed refresh of this data.

I'm in favor of the health check approach over fetching the data in init in failing. In speaking to Paul about it (since he might be pitching in for that change anyway), it has the advantage over the init of faster deploys in the case that the request fails. Skate and API take that approach as well.

I don't think Glides has any particular application-logic-specific health checks in its /_health, which is probably why I hadn't thought of it, but I guess if that's common we may as well do it here. Good catch on fixing the refresh first - I had completely lost track of that in the context of the TestFlight public beta.

KaylaBrady · 2024-09-26T15:14:58Z

@boringcactus I like having the Mox available here so that it is possible to mock the higher-level function rather than having to insert all the required data in persistent term. It was especially helpful in the StreamSubscriber tests to be able to mock route_ids_for_stops in a way that is consistent with mocking other data.

I don't think giving the docker container an API key would solve the issue - I thought it couldn't make any network requests in CI (these V3 API requests should succeed without an API key anyway)

boringcactus · 2024-09-26T15:33:03Z

That makes sense. If we want to be able to unit test the GlobalDataCache itself, we probably still need to be able to run it with arbitrary keys, but for testing things that call it it's more work to make a cache key and fill the persistent data directly. If we're setting up Mox here for that purpose, this is probably the right place to rework the GlobalControllerTest to use Mox instead and roll back my changes to GlobalController to pick a cache key out of the connection assigns.

I'm not sure I'm aware of any issues with the Docker container not being able to make outgoing network requests in CI - we run load tests in CI against a real API instance, and it works fine. The API requests would succeed with no API key, but they can't succeed with no API URL.

github-actions · 2024-09-26T15:35:32Z

Coverage of commit `c30143a`

Summary coverage rate:
  lines......: 79.6% (1353 of 1699 lines)
  functions..: 69.6% (567 of 815 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/application.ex                                  |88.9%      9|50.0%     2|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |85.4%     48|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

github-actions · 2024-09-26T19:29:33Z

Coverage of commit `05821ff`

Summary coverage rate:
  lines......: 79.6% (1353 of 1699 lines)
  functions..: 69.6% (567 of 815 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/application.ex                                  |88.9%      9|50.0%     2|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |85.4%     48|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

github-actions · 2024-09-26T19:41:41Z

Coverage of commit `f122fae`

Summary coverage rate:
  lines......: 79.6% (1351 of 1697 lines)
  functions..: 69.6% (567 of 815 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/application.ex                                  |88.9%      9|50.0%     2|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |85.4%     48|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0
  lib/mobile_app_backend_web/controllers/global_controller.ex            | 100%      4|85.7%     7|    -      0

Download coverage report

github-actions · 2024-10-01T13:54:34Z

Coverage of commit `24b01ce`

Summary coverage rate:
  lines......: 79.7% (1354 of 1699 lines)
  functions..: 69.6% (568 of 816 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |87.0%     46|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

This reverts commit f42ef4d.

github-actions · 2024-10-02T13:25:37Z

Coverage of commit `da02a7f`

Summary coverage rate:
  lines......: 79.7% (1354 of 1699 lines)
  functions..: 69.6% (568 of 816 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |87.0%     46|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

github-actions · 2024-10-02T16:21:27Z

Coverage of commit `b1e22d8`

Summary coverage rate:
  lines......: 79.7% (1354 of 1699 lines)
  functions..: 69.6% (568 of 816 functions)
  branches...: no data found

Files changed coverage rate:
                                                                         |Lines       |Functions  |Branches    
  Filename                                                               |Rate     Num|Rate    Num|Rate     Num
  =============================================================================================================
  lib/mbta_v3_api/stop.ex                                                |97.7%     44|87.5%    16|    -      0
  lib/mobile_app_backend/global_data_cache.ex                            |87.0%     46|86.7%    15|    -      0
  lib/mobile_app_backend/predictions/pub_sub.ex                          |92.9%     70|88.2%    17|    -      0
  lib/mobile_app_backend/predictions/stream_subscriber.ex                |85.7%      7| 100%     1|    -      0

Download coverage report

KaylaBrady requested a review from a team as a code owner September 25, 2024 12:53

KaylaBrady requested review from boringcactus and removed request for a team September 25, 2024 12:53

KaylaBrady added the deploy to dev-orange Automatically deploy this PR to dev-orange label Sep 25, 2024

KaylaBrady temporarily deployed to dev-orange September 25, 2024 12:53 — with GitHub Actions Inactive

KaylaBrady temporarily deployed to dev-orange September 25, 2024 13:18 — with GitHub Actions Inactive

KaylaBrady force-pushed the kb-pred-read-cache branch from b18f7cd to 011cfda Compare September 25, 2024 14:00

KaylaBrady temporarily deployed to dev-orange September 25, 2024 14:00 — with GitHub Actions Inactive

boringcactus reviewed Sep 25, 2024

View reviewed changes

KaylaBrady had a problem deploying to dev-orange September 26, 2024 15:11 — with GitHub Actions Failure

KaylaBrady removed the deploy to dev-orange Automatically deploy this PR to dev-orange label Sep 26, 2024

KaylaBrady force-pushed the kb-pred-read-cache branch from 246f0db to c30143a Compare September 26, 2024 15:34

KaylaBrady mentioned this pull request Sep 27, 2024

fix(GlobalDataCache): trigger timed fetch on startup #208

Merged

KaylaBrady added 2 commits October 1, 2024 09:38

perf(Predictions.PubSub): get relevant stops from GlobalDataCache

c1580be

perf(Predictions.PubSub): read routes for stop from global cached data

0cff77e

KaylaBrady force-pushed the kb-pred-read-cache branch from f122fae to 0cff77e Compare October 1, 2024 13:48

perf: increase broadcast interval to 1s

f42ef4d

KaylaBrady added the deploy to dev-orange Automatically deploy this PR to dev-orange label Oct 1, 2024

KaylaBrady had a problem deploying to dev-orange October 1, 2024 13:49 — with GitHub Actions Error

refactor(Predictions.PubSub): remove unnecessary error case

24b01ce

KaylaBrady temporarily deployed to dev-orange October 1, 2024 13:53 — with GitHub Actions Inactive

boringcactus approved these changes Oct 1, 2024

View reviewed changes

Revert "perf: increase broadcast interval to 1s"

da02a7f

This reverts commit f42ef4d.

KaylaBrady temporarily deployed to dev-orange October 2, 2024 13:24 — with GitHub Actions Inactive

feat: increase broadcast interval to 5s

b1e22d8

KaylaBrady temporarily deployed to dev-orange October 2, 2024 16:20 — with GitHub Actions Inactive

KaylaBrady merged commit 02d94a0 into main Oct 2, 2024
5 checks passed

KaylaBrady deleted the kb-pred-read-cache branch October 2, 2024 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(Predictions.PubSub): Read stops + routes from global cache instead of API #206

perf(Predictions.PubSub): Read stops + routes from global cache instead of API #206

KaylaBrady commented Sep 25, 2024

github-actions bot commented Sep 25, 2024

KaylaBrady commented Sep 25, 2024 •

edited

Loading

github-actions bot commented Sep 25, 2024

github-actions bot commented Sep 25, 2024

KaylaBrady commented Sep 25, 2024 •

edited

Loading

boringcactus left a comment

boringcactus Sep 25, 2024

boringcactus Sep 25, 2024

KaylaBrady Sep 26, 2024

boringcactus Sep 25, 2024

KaylaBrady Sep 26, 2024

KaylaBrady Sep 27, 2024

boringcactus Sep 27, 2024

KaylaBrady commented Sep 26, 2024

boringcactus commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

github-actions bot commented Oct 1, 2024

github-actions bot commented Oct 2, 2024

github-actions bot commented Oct 2, 2024

perf(Predictions.PubSub): Read stops + routes from global cache instead of API #206

perf(Predictions.PubSub): Read stops + routes from global cache instead of API #206

Conversation

KaylaBrady commented Sep 25, 2024

Summary

github-actions bot commented Sep 25, 2024

Coverage of commit 4781abb

KaylaBrady commented Sep 25, 2024 • edited Loading

github-actions bot commented Sep 25, 2024

Coverage of commit b18f7cd

github-actions bot commented Sep 25, 2024

Coverage of commit 011cfda

KaylaBrady commented Sep 25, 2024 • edited Loading

boringcactus left a comment

Choose a reason for hiding this comment

boringcactus Sep 25, 2024

Choose a reason for hiding this comment

boringcactus Sep 25, 2024

Choose a reason for hiding this comment

KaylaBrady Sep 26, 2024

Choose a reason for hiding this comment

boringcactus Sep 25, 2024

Choose a reason for hiding this comment

KaylaBrady Sep 26, 2024

Choose a reason for hiding this comment

KaylaBrady Sep 27, 2024

Choose a reason for hiding this comment

boringcactus Sep 27, 2024

Choose a reason for hiding this comment

KaylaBrady commented Sep 26, 2024

boringcactus commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

Coverage of commit c30143a

github-actions bot commented Sep 26, 2024

Coverage of commit 05821ff

github-actions bot commented Sep 26, 2024

Coverage of commit f122fae

github-actions bot commented Oct 1, 2024

Coverage of commit 24b01ce

github-actions bot commented Oct 2, 2024

Coverage of commit da02a7f

github-actions bot commented Oct 2, 2024

Coverage of commit b1e22d8

Coverage of commit `4781abb`

KaylaBrady commented Sep 25, 2024 •

edited

Loading

Coverage of commit `b18f7cd`

Coverage of commit `011cfda`

KaylaBrady commented Sep 25, 2024 •

edited

Loading

Coverage of commit `c30143a`

Coverage of commit `05821ff`

Coverage of commit `f122fae`

Coverage of commit `24b01ce`

Coverage of commit `da02a7f`

Coverage of commit `b1e22d8`