Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent tiny buffer preventing subscriber acking message quickly #2563

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

matt2e
Copy link
Collaborator

@matt2e matt2e commented Aug 30, 2024

fixes #2554

Current theory is this:

  • cluster has a low number of modules active (let's say 0)
  • gRPC call comes into controller to PullSchema
  • this causes us to subscribe to schema changes, with a chan with the length of the count of the current active modules (in our example, 0)
    • The lib then creates an extra buffer to queue up messages while the subscriber processes messages, with the buffer being the same size as the provided chan (again, 0)
  • As a change schema change message is received by the subscriber, we stream the message back over gRPC. This may take time.
  • While this is happening, another schema update may occur. the lib will not be able to ack the message because the buffer size is too small (0) so it will wait for the message to be received.
  • the lib will timeout if the networking is not done fast enough.

Repro'd by doing the following:

  • Added a sleep step before sending the schema update over the network.
  • Start up FTL without any modules
  • Ran ftl schema get --watch
  • Ran ftl deploy with a bunch of module
  • Hit the ack timout panic
    Could not repro after making the chan length always be a decent size.

@matt2e matt2e requested review from a team and gak and removed request for a team August 30, 2024 07:20
@ftl-robot ftl-robot mentioned this pull request Aug 30, 2024
@alecthomas
Copy link
Collaborator

Nice catch!

@matt2e matt2e added this pull request to the merge queue Aug 30, 2024
Merged via the queue into main with commit a957549 Aug 30, 2024
25 checks passed
@matt2e matt2e deleted the matt2e/pubsub-panic branch August 30, 2024 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic in pubsub lib Topic
2 participants