Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: solver exponential backoff #450

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

PBillingsby
Copy link
Contributor

Summary

This pull request makes the following changes:

  • Add exponential backoff mechanism for connection retries
  • Improve detection of specific WebSocket closure errors
  • Add error propagation to ensure issues during WebSocket setup are logged
  • Add mutexes (sync.Mutex) to synchronize access to the WebSocket connection during reconnection or ping operations
  • Change the signature of ConnectWebSocket to return an error alongside the response channel
  • Move ctx to start of params for SolverClient

Task/Issue reference

Closes: #429

Test plan

  1. Modify pkg/http/websocket_server.go to force an error in HandleFunc:
r.HandleFunc(path, func(w http.ResponseWriter, r *http.Request) {
		http.Error(w, "Internal Server Error", http.StatusInternalServerError)
		return
  1. Start the stack
  2. Observe the job-creator terminal for exponential backoff logs
ERR pkg/http/websocket_client.go:99 > WebSocket connection failed: websocket: bad handshake
Reconnecting in 2.200 seconds...
2024-11-25T11:11:28-05:00 DBG pkg/http/websocket_client.go:99 > WebSocket connection connecting: ws://localhost:8080/api/v1/ws?&Type=JobCreator&ID=0x3C44CdDdB6a900fa2b585dd299e03d12FA4293BC
2024-11-25T11:11:28-05:00 ERR pkg/http/websocket_client.go:99 > WebSocket connection failed: websocket: bad handshake
Reconnecting in 3.640 seconds...
  1. Cancel the solver service context to see the context cancellation handling:
2024-11-25T11:11:54-05:00 INF pkg/http/websocket_client.go:99 > Solver service appears to be down
2024-11-25T11:11:54-05:00 ERR pkg/solver/client.go:34 > Error in WebSocket connection. error="solver service unavailable: dial tcp [::1]:8080: connect: connection refused"
Error: solver service unavailable: dial tcp [::1]:8080: connect: connection refused

@PBillingsby PBillingsby requested a review from bgins November 25, 2024 16:14
@PBillingsby PBillingsby requested a review from a team as a code owner November 25, 2024 16:14
@PBillingsby PBillingsby changed the title Feat solver exponential backoff feat: solver exponential backoff Nov 25, 2024
const (
initialDelay = 1.0
maxAttempts = 10
exponential = 1.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we rename this to exponent?

errCh <- err
select {
case <-ctx.Done():
log.Info().Msg("Exiting readMessage loop due to context cancellation.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info().Msg("Exiting readMessage loop due to context cancellation.")
log.Debug().Msg("Exiting readMessage loop due to context cancellation.")

This detail is a bit internal, would suggest dropping the log level to debug.

continue
switch {
case websocket.IsCloseError(err, websocket.CloseGoingAway, websocket.CloseAbnormalClosure):
log.Info().Msg("Solver service closed connection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, we only use this helper function to connect to the solver. But we might re-use it in other contexts, so we may want to reword these errors to something like "Websocket server closed connection".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we may want to log errors we where we will attempt again at the warn level.

switch {
case websocket.IsCloseError(err, websocket.CloseGoingAway, websocket.CloseAbnormalClosure):
log.Info().Msg("Solver service closed connection")
return nil, fmt.Errorf("solver connection closed: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will returning here cause us to give up on our connection attempts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add exponential backoff to websocket client
2 participants