Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backoff and reconnect support #26

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BrendanBall
Copy link
Contributor

@BrendanBall BrendanBall commented Jun 22, 2022

Currently Stargate prevents the application from starting up
or causes it to crash if pulsar is or becomes unavailable.
This is a result of how WebSockex is implemented.
This builds on a refactor of WebSockex at
Azolo/websockex#112
which implements connection management async to process startup.
This makes Stargate more production ready since it will allow
an application to degrade gracefully when pulsar isn't available
temporarily.

Main changes as a result of this:

  1. Reconnect backoff feature is added to be customized by the user
  2. Since Stargate will continue running even when not connected to
    Pulsar, the user needs to know that messages aren't being produced
    successfully.
    A breaking change was made to the async ACK MFA where the
    first argument is now the result of the produce, which may either be
    :ok or {:error, reason}.

This should only be merged if Azolo/websockex#112 gets merged since it currently relies on a fork of WebSockex

@BrendanBall BrendanBall marked this pull request as draft June 22, 2022 10:16
@BrendanBall BrendanBall force-pushed the backoff_reconnect branch 2 times, most recently from 192ca4d to efbaf93 Compare June 24, 2022 06:23
Currently Stargate prevents the application from starting up
or causes it to crash if pulsar is or becomes unavailable.
This is a result of how WebSockex is implemented.
This builds on a refactor of WebSockex at
Azolo/websockex#112
which implements connection management async to process startup.
This makes Stargate more production ready since it will allow
an application to degrade gracefully when pulsar isn't available
temporarily.

Main changes as a result of this:
1. Reconnect backoff feature is added to be customized by the user
2. Since Stargate will continue running even when not connected to
   Pulsar, the user needs to know that messages aren't being produced
successfully.
A breaking change was made to the async ACK MFA where the
first argument is now the result of the produce, which may either be
`:ok` or `{:error, reason}`.
@BrendanBall
Copy link
Contributor Author

@jeffgrunewald would you accept this PR if I manage to get the changes to WebSockex merged?
I was just going to implement this feature by adding https://hex.pm/packages/connection to Stargate, but I didn't like how it added complexity due to not wanting to proxy all messages through the extra process.

I did also consider gun since there's an open issue (#13) to replace WebSockex with gun, but it also has its own issues which would need to be fixed.
E.g. When I tried it, it didn't return an econnrefused error, instead the process just silently exits.

@jeffgrunewald
Copy link
Owner

I’m cool with all this. Happy to accept the PR and cut a release once Azolo/websockex#112 is approved and merged.

@brunoripa
Copy link

It looks to me that Azolo/websockex#112 is not getting any love. How are you fixing this temporarly, if you are ?

@jeffgrunewald
Copy link
Owner

Afraid I don't have any plans to mitigate this at present as I'm no longer working with Pulsar and don't currently the bandwidth to explore fixes. Outside of an upstream change to Websockex I wonder if Mint.Websocket might be adaptable as it didn't exist when I first wrote stargate. Alternatively it should be possible to wrap the stargate client in its own supervisor within an application that talks to Pulsar to take responsibility for handling backoff and reconnect logic when Pulsar is or becomes unavailable.

Pulsar/Stargate is a bit of a special case in that it expects a persistent streaming connection to be available to Pulsar but generally speaking as a corollary do a database connection pool you don't typically see the driver library taking responsibility for backoff and reconnect attempts; I think there's an argument to be made that this is better handled by your application logic (and some custom supervisor you write wrapping the stargate interactions) since your application is best able to determine how it should handle a degraded state of the Pulsar connection.

@brunoripa
Copy link

Afraid I don't have any plans to mitigate this at present as I'm no longer working with Pulsar and don't currently the bandwidth to explore fixes. Outside of an upstream change to Websockex I wonder if Mint.Websocket might be adaptable as it didn't exist when I first wrote stargate. Alternatively it should be possible to wrap the stargate client in its own supervisor within an application that talks to Pulsar to take responsibility for handling backoff and reconnect logic when Pulsar is or becomes unavailable.

Pulsar/Stargate is a bit of a special case in that it expects a persistent streaming connection to be available to Pulsar but generally speaking as a corollary do a database connection pool you don't typically see the driver library taking responsibility for backoff and reconnect attempts; I think there's an argument to be made that this is better handled by your application logic (and some custom supervisor you write wrapping the stargate interactions) since your application is best able to determine how it should handle a degraded state of the Pulsar connection.

Thanks for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants