Skip to content

Channel Broker

Julien Portalier edited this page Mar 14, 2024 · 4 revisions

Purpose

The channel-broker acts as a dam between the survey-broker that polls active surveys and schedules contacts (IVR calls or SMS messages) for new and failed respondents. The channel-broker memorizes and keeps updated active contacts on the external channel services (Verboice, Nuntium) and prevents having more contacts than the channel capacity.

One single channel-broker process will be spawned for each distinct channel. Surveys running in parallel inside a single Surveda instance will all go through the same channel-broker and thus share the channel equally, though not randomized because of the survey-broker design which polls surveys one by one; each survey will advance one after the other.

Limitations

The channel-broker doesn't monitor the external services. It assumes that the external service is dedicated to it. It only memorizes which contacts have been sent to the external services from this Surveda instance, then monitors these contacts only. The channel-broker doesn't ask the external services for their actual queues.

That being said, if we know that multiple Surveda instances will run Surveys in parallel on the same external service, the channel capacity of each Surveda instance can be tweaked to share the channel queue (equally or not).

Crash Recovery

The state of the channel-broker is saved in memory using the channel-broker-agent. If a channel-broker process crashes, its state will be recovered from the agent when it's restarted, and thus resume the channel monitoring as if nothing happened. If a crash happened during the process of a callback, the active calls GC will eventually detect that the contact is no longer active and clean it up, so what the channel-broker won't stay desynchronized for too long.

If the Surveda instance crashes then the state will be lost, and a new channel-broker process will be restarted with an empty state, thus pushing up to channel capacity contacts to the external services, regardless of how many are actually in queue, but will then stop until all previously sent contacts are completed before sending, until the new contacts start to be processed.

Now, if a Surveda instance crashes multiple times, then up to channel capacity contacts will be pushed each time, possibly filling the queue of the external service...

With the latest release, the channel-broker queue is now kept in the database. If a channel-broker process crashes or Surveda itself crashes, the actors shall be restarted and the operation continue from the queue.

TODO: verify the actual impact in case of crash between dequeue and actual call to the external channel service —it only impacts one contact, but what's the impact?

Design

See Start Survey for how the channel-broker is integrated into the system.

Configuration

UI Settings

The channel-capacity can be customized for each channel.

Environment Variables

  • DEFAULT_CHANNEL_CAPACITY (defaults to 100);
  • CHNL_BKR_GC_IDLE_MINUTES how long until a contact is considered idle (defaults to 2);
  • CHNL_BKR_GC_INTERVAL_MINUTES how often to run the idle-contacts state check against the remote service (defaults to 2);
  • SHUT_DOWN_MINUTES inactivity timeout before a channel-broker process shall exit (defaults to 30) (:warning: currently not working).