Trigger PA Messages from RTS #764

PaulJKim · 2024-06-11T02:20:14Z

Summary of changes

Asana Ticket: RTS: Trigger PA Message Readouts

This PR adds a new engine that reads in active PA messages from Screenplay and schedules them to play on sign zones. It also adds a map to sign states to track when the most recent play of a given pa message was. This is to add some resilience against engine restarts which would result in triggers of PA messages that are too close together.

-When a message goes inactive, it's current timer will be cancelled and it's record delete from the ets table.
-When an interval is changed, we will update the scheduling so that the next time the message plays will be in line with it's new interval.

github-actions · 2024-06-11T02:27:31Z

Coverage of commit `47c2b62`

Summary coverage rate:
  lines......: 73.4% (2085 of 2840 lines)
  functions..: 76.0% (585 of 770 functions)
  branches...: no data found

Files changed coverage rate:
                                                             |Lines       |Functions  |Branches    
  Filename                                                   |Rate     Num|Rate    Num|Rate     Num
  =================================================================================================
  lib/engine/pa_messages.ex                                  |85.2%     54| 100%    12|    -      0
  lib/fake/httpoison.ex                                      |74.5%     47|57.1%     7|    -      0
  lib/pa_messages/pa_message.ex                              | 100%      3| 100%     5|    -      0
  lib/realtime_signs.ex                                      | 100%     15| 100%     3|    -      0
  lib/signs/bus.ex                                           |84.1%    246|95.2%    42|    -      0
  lib/signs/realtime.ex                                      |92.5%     67|91.7%    12|    -      0
  lib/signs/utilities/audio.ex                               |87.7%    130| 100%    16|    -      0

Download coverage report

github-actions · 2024-06-11T02:55:04Z

Coverage of commit `9fe3e57`

Summary coverage rate:
  lines......: 73.3% (2083 of 2840 lines)
  functions..: 76.0% (585 of 770 functions)
  branches...: no data found

Files changed coverage rate:
                                                             |Lines       |Functions  |Branches    
  Filename                                                   |Rate     Num|Rate    Num|Rate     Num
  =================================================================================================
  lib/engine/pa_messages.ex                                  |85.2%     54| 100%    12|    -      0
  lib/fake/httpoison.ex                                      |74.5%     47|57.1%     7|    -      0
  lib/pa_messages/pa_message.ex                              | 100%      3| 100%     5|    -      0
  lib/realtime_signs.ex                                      | 100%     15| 100%     3|    -      0
  lib/signs/bus.ex                                           |83.3%    246|95.2%    42|    -      0
  lib/signs/realtime.ex                                      |92.5%     67|91.7%    12|    -      0
  lib/signs/utilities/audio.ex                               |87.7%    130| 100%    16|    -      0

Download coverage report

panentheos · 2024-06-13T14:42:34Z

config/runtime.exs

+    active_headend_path: System.get_env("ACTIVE_HEADEND_S3_PATH"),
+    screenplay_url: System.get_env("SCREENPLAY_URL"),
+    screenplay_api_key: System.get_env("SCREENPLAY_API_KEY"),
+    active_pa_messages_path: System.get_env("ACTIVE_PA_MESSAGES_PATH")


What's the rationale for this being a runtime environment variable? If we just need a testing hook, then just using the application environment should be sufficient.

As a side note, it would be nice at some point to convert the Fake http module to use Mox instead, which would make testing more straightforward, but that will probably take a bit of effort.

Hmm yeah that's a good point. I think my rationale was just that the path could change since the screenplay api is relatively new and still being developed, but I guess it would be just as much effort to make the devops change rather than just make a code change in RTS or probably even less. I'll make a change to remove this one

panentheos · 2024-06-13T14:57:42Z

lib/engine/pa_messages.ex

+  end
+
+  def create_table(state) do
+    :ets.new(state.pa_message_timers_table, [:named_table, read_concurrency: true])


Do we gain anything from using ETS here? It looks like the only access is inside this module, so there shouldn't be any contention to worry about.

Oh that is true. Originally I had planned that the sign processes would be making requests to access the table so started with this and left it, but we can simplify things and go without the table

panentheos · 2024-06-13T15:51:27Z

lib/engine/pa_messages.ex

+              ms_elapsed = existing_pa_message.interval_in_ms - remaining_ms
+              temp_interval = (active_pa_message.interval_in_ms - ms_elapsed) |> max(0)
+              cancel_pa_timer(timer_ref, pa_id)
+              schedule_pa_message(active_pa_message, temp_interval, table)


The timer-based approach will certainly work, but now that I'm looking at the code, I wonder if it's a little more complex than we need. I know the original motivation was to avoid excessive polling, but now that all the logic is centralized in this engine (which I think is good), we could also just keep a map of id-to-last-playtime in state, and check all the deltas against the current time once per second (especially since we're already doing an :update loop). My hunch is that that would end up being cleaner than wrangling the timers. Thoughts?

Hmm yeah I think that approach works nicely and will make the "edited interval" piece of the logic a good deal simpler. And it removes the need to cancel timers and such as you hinted at. I think we also would get the "handling" of inactive messages for free as well since if they're not included in the response, we won't be checking for the last playtime anyway.

panentheos · 2024-06-13T16:03:57Z

lib/signs/realtime.ex

@@ -47,7 +47,8 @@ defmodule Signs.Realtime do
                announced_alert: false,
                prev_prediction_keys: nil,
                prev_predictions: [],
-                uses_shuttles: true
+                uses_shuttles: true,
+                pa_message_plays: %{}


I'm not sure I understand the benefit of tracking this here. Is this so in case the engine crashes, we won't spam messages? If that's the concern, and we're willing to accept a slight behavior change, we could schedule new messages with their full duration instead of immediately, which would result in fewer messages in that failure mode, rather than excess ones.

Yeah that was the intention here, just to prevent consecutive plays due to an engine crashes/restarts. Ah so you mean that a PA Message wouldn't play immediately upon scheduling and instead wait one interval before it's first play?

I had actually sort of brought that up in the product channel last week but it was only discussed briefly. Kevin said a PA message should play as soon as it can right after it's created, but I'm still not sure if that's also what the ARINC software does currently. Seems like its still certainly up for discussion though. I'll raise the question again to the product folks and see if we would be okay with that behavior change.

Discussed a bit further in the thread and I think I lean towards avoiding scheduling a new message with the full duration. I don't think it's been fully confirmed or is explicitly a standard practice, but it seems that OIOs tend to schedule a message and look immediately for a "proof of play" and I'd rather not disrupt their workflow if we can avoid it. I'm open to other ideas for preventing spammed message plays though if you have any thoughts?

PaulJKim · 2024-06-14T22:34:55Z

config/runtime.exs

@@ -36,7 +36,9 @@ if config_env() != :test do
    message_log_s3_folder: System.get_env("MESSAGE_LOG_S3_FOLDER"),
    message_log_report_s3_folder: System.get_env("MESSAGE_LOG_REPORT_S3_FOLDER"),
    monitoring_api_key: System.get_env("MONITORING_API_KEY"),
-    active_headend_path: System.get_env("ACTIVE_HEADEND_S3_PATH")
+    active_headend_path: System.get_env("ACTIVE_HEADEND_S3_PATH"),
+    active_pa_messages_url: System.get_env("ACTIVE_PA_MESSAGES_URL"),


On second thought @panentheos , I'm not sure if I interpreted your comment about the runtime env here correctly. Did you mean that it's not necessary to split the screenplay base url and the route for active PA messages or that we don't need the screenplay url as an env variable at all? I assume we still want it as an env variable for local testing and such, right?

My thinking was: The screenplay base URL should be a system env variable, like we do for other systems. The path for the endpoint should not be: Ideally, it wouldn't be abstracted at all, since the actual API path on Screenplay is always the same. However, because of how the Fake module works, we need some sort of hook for testing different responses. The least intrusive thing that gets us that would be to put the path in the application environment, with the real value configured statically in config.exs, and then have the tests modify the application env like they're doing.

… into pk/pa-messages

github-actions · 2024-06-18T22:11:23Z

Coverage of commit `7c505b6`

Summary coverage rate:
  lines......: 73.1% (2064 of 2824 lines)
  functions..: 75.8% (581 of 766 functions)
  branches...: no data found

Files changed coverage rate:
                                                             |Lines       |Functions  |Branches    
  Filename                                                   |Rate     Num|Rate    Num|Rate     Num
  =================================================================================================
  lib/engine/pa_messages.ex                                  |63.2%     38| 100%     8|    -      0
  lib/fake/httpoison.ex                                      |74.5%     47|57.1%     7|    -      0
  lib/pa_messages/pa_message.ex                              | 100%      3| 100%     5|    -      0
  lib/realtime_signs.ex                                      | 100%     15| 100%     3|    -      0
  lib/signs/bus.ex                                           |83.3%    246|95.2%    42|    -      0
  lib/signs/realtime.ex                                      |89.4%     66|91.7%    12|    -      0
  lib/signs/utilities/audio.ex                               |87.7%    130| 100%    16|    -      0

Download coverage report

panentheos

See a few comments, then looks good. Before we merge/deploy this, we should make sure that the environment variables are set up, and that screenplay is ready to serve responses in prod.

lib/signs/bus.ex

panentheos · 2024-06-24T21:19:41Z

lib/signs/utilities/audio.ex

+        end
+    end
+    |> then(fn
+      :ok -> Map.put(pa_message_plays, pa_message.id, DateTime.utc_now())


PaEss.Updater.play_message doesn't currently return :ok for the new SCU code path, so it should be changed to do that. Alternatively, we could just remove the branching here, since I don't think we expect a recoverable failure case anyway.

panentheos · 2024-06-24T21:32:41Z

lib/engine/pa_messages.ex

+    end)
+  end
+
+  defp handle_inactive_pa_messages(active_pa_messages, pa_messages_last_sent) do


Do we want to do this? If we proactively wipe this state, then deactivating and reactivating an alert would cause it to play again immediately, which could be undesirable. If we leave the sign state intact, it would block it instead.

Yeah that makes sense. What I had in mind was to keep things in line with the immediate feedback that OIOs tend to expect when they set up a PA message, but maybe that's just for initially creating the message or reactivating it after a longer length of time has passed. Probably better to not allow the chance of a bunch of consecutive plays.

I could see the logging here being a useful audit, but we could also just do that from the Screenplay side

panentheos · 2024-06-24T21:44:58Z

lib/signs/bus.ex

@@ -121,6 +124,23 @@ defmodule Signs.Bus do
    {:ok, state}
  end

+  @impl true
+  def handle_info({:play_pa_message, pa_message}, sign) do
+    Logger.info("pa_message: action=send id=#{pa_message.id} destination=#{sign.id}")


The logging looks like it could be cleaned up a little. This one should probably be moved into send_pa_message and only done if we actually play the message. There's another log with the same action value in the engine, which we could remove.

github-actions · 2024-06-25T20:54:34Z

Coverage of commit `e5bc8be`

Summary coverage rate:
  lines......: 73.4% (2057 of 2802 lines)
  functions..: 75.7% (578 of 764 functions)
  branches...: no data found

Files changed coverage rate:
                                                             |Lines       |Functions  |Branches    
  Filename                                                   |Rate     Num|Rate    Num|Rate     Num
  =================================================================================================
  lib/engine/pa_messages.ex                                  |72.4%     29| 100%     7|    -      0
  lib/fake/httpoison.ex                                      |74.5%     47|57.1%     7|    -      0
  lib/pa_messages/pa_message.ex                              | 100%      3| 100%     5|    -      0
  lib/realtime_signs.ex                                      | 100%     15| 100%     3|    -      0
  lib/signs/bus.ex                                           |84.7%    242|95.2%    42|    -      0
  lib/signs/realtime.ex                                      |92.1%     63|91.7%    12|    -      0
  lib/signs/utilities/audio.ex                               |88.5%    131| 100%    17|    -      0

Download coverage report

Trigger PA Messages from RTS

47c2b62

Make logs errors and remove something used for testing

9fe3e57

panentheos reviewed Jun 13, 2024

View reviewed changes

Address PR feedback

839dac3

PaulJKim requested a review from a team as a code owner June 14, 2024 22:33

PaulJKim commented Jun 14, 2024

View reviewed changes

PaulJKim added 4 commits June 18, 2024 10:40

Put active messages path only in application env

f066eca

fix config

e86ecc4

Remove extraneous config

288bb2e

fix tests

453d042

sloanelybutsurely force-pushed the pk/pa-messages branch 2 times, most recently from 7d20c57 to b58619c Compare June 18, 2024 21:01

PaulJKim added 7 commits June 18, 2024 17:17

Trigger PA Messages from RTS

fe1511f

Make logs errors and remove something used for testing

4a6719c

Address PR feedback

5b0e2da

Put active messages path only in application env

63f72ec

fix config

585de73

Remove extraneous config

bdd9232

fix tests

54fb458

sloanelybutsurely force-pushed the pk/pa-messages branch from f999bf0 to 54fb458 Compare June 18, 2024 21:17

PaulJKim added 3 commits June 18, 2024 17:39

try adding stubs

e161092

Merge branch 'pk/pa-messages' of https://github.com/mbta/realtime_signs…

71f2846

… into pk/pa-messages

add TTS expects

7c505b6

panentheos approved these changes Jun 24, 2024

View reviewed changes

Address PR feedback

e5bc8be

PaulJKim temporarily deployed to dev-green-linux June 26, 2024 20:25 — with GitHub Actions Inactive

PaulJKim merged commit 92ab430 into main Jul 1, 2024
3 checks passed

PaulJKim deleted the pk/pa-messages branch July 1, 2024 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger PA Messages from RTS #764

Trigger PA Messages from RTS #764

PaulJKim commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

panentheos Jun 13, 2024

PaulJKim Jun 13, 2024

panentheos Jun 13, 2024

PaulJKim Jun 13, 2024

panentheos Jun 13, 2024 •

edited

Loading

PaulJKim Jun 13, 2024 •

edited

Loading

panentheos Jun 13, 2024

PaulJKim Jun 13, 2024

PaulJKim Jun 14, 2024

PaulJKim Jun 14, 2024

panentheos Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 18, 2024

panentheos left a comment

panentheos Jun 24, 2024

panentheos Jun 24, 2024

PaulJKim Jun 25, 2024

PaulJKim Jun 25, 2024

panentheos Jun 24, 2024

github-actions bot commented Jun 25, 2024

Trigger PA Messages from RTS #764

Trigger PA Messages from RTS #764

Conversation

PaulJKim commented Jun 11, 2024

Summary of changes

github-actions bot commented Jun 11, 2024

Coverage of commit 47c2b62

github-actions bot commented Jun 11, 2024

Coverage of commit 9fe3e57

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panentheos Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

PaulJKim Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panentheos Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jun 18, 2024

Coverage of commit 7c505b6

panentheos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 25, 2024

Coverage of commit e5bc8be

Coverage of commit `47c2b62`

Coverage of commit `9fe3e57`

panentheos Jun 13, 2024 •

edited

Loading

PaulJKim Jun 13, 2024 •

edited

Loading

panentheos Jun 17, 2024 •

edited

Loading

Coverage of commit `7c505b6`

Coverage of commit `e5bc8be`