-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoAway Frames Leading to App Errors #544
Comments
Hey @sam0jones0, thanks for raising this. Quick question just to clarify where the error might lie here. I guess this error is not happening when you use either Ember + HTTP PubSub or gRPC PubSub + Any other client, right? |
Sorry, just saw the |
Thanks for getting back so quickly, yeah its odd. We've tried EmberClient and http4s-netty, same thing. Here are a few more logs leading up to the issue occuring
I'll let you know if I figure anything out. My guess is HTTP2 GoAway frames are cancelling the stream somehow. As I mentioned above, I added logging to the top level Stream stream.handleErrorWith { error =>
Stream.eval(logger.error(error)(s"Stream error with reason: [$error]")) And saw
I would have thought EmberClient manages connection lifecycles for you? |
Yeah, me too. Odd thing is we also have several subscribers using Ember in high demand topics and haven't observed this issue... What's the size of the subscription it is connecting to? It also seems you are making several refreshings of the GCP token to close to one another, is that expected? |
Its quite a small subscription, low traffic. Maybe 1-2 TPS. The double refreshing is interesting and I'm looking into that now. Perhaps I've somehow doubled up the stream and the 'second' stream is writing to the closed connection, triggering the crash. That would potentially explain logs like this
Not sure if its related but this issue mentioned using |
The bug appears when Pubsub sends a GoAway(max_age) after 1 hour. Successful restarting of Stream on 1st failureThis consistently occurs at T+1h
Note that double Ping.Ack leading to GoAway(EnhanceYourCalm / too_many_pings) Failure to restart Stream after occurrenceThis consistently occurs at T+2h
Unsure if the way I'm creating the client has any impact? for {
client <- EmberClientBuilder
.default[F]
.withHttp2
.build
.mproduct(client => TokenProvider.serviceAccount(client).pure[F].toResource)
.map { case (client, tokenProvider) => tokenProvider.clientMiddleware(client) }
} yield {
given Client[F] = client
PubSubSubscriber(config, subscription, deserializer, logger)
} I am struggling to explain the double and then triple Ping.ack. It suggests there is a duplicating of clients/connections that is compounding with each Stream start. I don't suppose you have any example code of projects using this package I could compare against? Thanks for all your help so far by the way, really appreciated. |
The only difference I see is you're using HTTP2. Does it still happen if you don't use it? The double or triple pings could be explained by your concurrency settings. How are you creating the subscriber? |
Read concurrency is set to 1. Subscriber is created like this PubSubSubscriber
.http[F]
.projectId(config.subscriber.projectId)
.subscription(Subscription(subscription))
.uri(config.subscriber.uri)
.httpClient(client)
.noRetry
.errorHandler {
case (_, t) => t.raiseError
}
.batchSize(config.subscriber.batchSize) // 100
.maxLatency(config.subscriber.maxLatency) // 10 seconds
.readMaxMessages(config.subscriber.readMaxMessages) // 1000
.readConcurrency(config.subscriber.readConcurrency) // 1
.raw I'm trying to see if I can put together a simple reproducer as well. Over the weekend I left a build running with a http subscriber and http2 client. Out of 4 pods 3 had the error occur after 12 hours (all 3 at T+12hr) 1 pod ran fine all weekend. Just deployed a build using http subscriber and http 1 client. Will let you know how it goes. |
Switching to http1 seems to have resolved the issue by the way 🎉 When I get some time I'll try and code up a reproducer for the http2 bug |
Nice! Yeah, that would be super-useful in order to open an issue on http4s |
We're a bit stuck on some weird behaviour in our application.
We're using fs2-pubsub, with EmberClientwith http2 support to make gRPC requests to Google pubsub api.
I.e. Setup something like this
We're seeing periodic buildup of un-acked messages
A newly started pod behaves properly for a while, but after 1-2 hours we see these logs
GoAway
HTTP2 frames indicate the server intends to close the connection.The first
GoAway
additionalDebugData decodes tomax_age
. The second GoAway message's additionalDebugData decodes totoo_many_pings
. Notice the two back-to-back Ping.Ack which triggers the GoAway: EnhanceYourCalmFollowing these logs that pod will no longer process any new pubsub messages
We have (unsuccessfully) tried many things to fix this:
We're a bit confused, but it appears the issue may lie within the interaction of fs2-pubsub and http4s' EmberClient and how it handles HTTP2 connection lifecycle management.
The text was updated successfully, but these errors were encountered: