Disable autoPing in WS tests #3944

kciesielski · 2024-07-19T07:59:42Z

Sometimes WebSocket tests fail, because the default auto ping is sent to the client, which then tries to send back a pong, while the backend has already closed the connection.
This PR ensures that autoPing is enabled only in the test which actually tests pings.

adamw · 2024-07-19T08:56:08Z

Does this mean that ping frames are sent after the connection is properly closed by exchanging close frames? I'm wondering if this is some problem in our implementation, or is it just scoped to tests?

kciesielski · 2024-07-19T09:11:58Z

I assumed they are sent right before the Close response, and that the issue is on the client side, because it should ignore it after sending its Close request. But you are right - it may be the backend which has a bug and sends ping when it shouldn't, so I'll verify this. It may be both as well ;)

kciesielski · 2024-07-19T12:08:26Z

I investigated the subject, and:

Backends won't send ping after sending close
They can still send ping right before sending close, as ping flows are paralle. Tests often wait for close response using:

ws.eitherClose(ws.receiveText())

and receiveText() is in fact receiveText(pongOnPing = true), so the client tries to send Pong and it fails internally:

[info]   Cause: java.io.IOException: Output closed
[info]   at java.net.http/jdk.internal.net.http.websocket.MessageEncoder.encodePong(MessageEncoder.java:301)
[info]   at java.net.http/jdk.internal.net.http.websocket.TransportImpl$SendTask$1.onPong(TransportImpl.java:404)
[info]   at java.net.http/jdk.internal.net.http.websocket.TransportImpl$SendTask$1.onPong(TransportImpl.java:367)

I think this should be improved in sttp -> sending Ping/Pong should be silently skipped just like sending another Close when webSocket.isClosed(). From the protocol perspective it's not really a violation, rather a gray area that's not well specified.

adamw · 2024-07-21T17:56:19Z

So I think that's a definitely good catch, but it would also be good to properly fix this, as by disabling auto-pings we won't catch such problems in the future.

But I'm wondering what's the proper fix here ;) I think that first of all, after a channel is closed, we shouldn't send any more pings. Does this happen only in "our" Netty servers, or others as well? I think in Netty the problem might be fixed by removing the WebSocketAutoPingHandler from the pipeline, after the channel is determined to be closed in wrapSubscriberWithNettyCallback, .onComplete or .onError (btw, if there's an IOException, what's the point of sending a close frame - if we know that the channel is closed?)

As for sttp, take a look at: softwaremill/sttp#2236 - is that what you meant?

kciesielski · 2024-07-22T06:24:14Z

after a channel is closed, we shouldn't send any more pings
As far as I checked, we don't do that. As soon as a backend processes a Close and sends a Close response, no more pings will follow this response.
That's why I believe it's to be fixed only in sttp, I'll follow-up in the linked issue.

adamw · 2024-07-22T16:27:25Z

@kciesielski ah, we don't send any more pings because the "reactive stream" is complete, and it's being closed in Netty? Or is there another reason? I'm not quite able to pinpoint this in code.

kciesielski · 2024-07-23T06:57:05Z

I did some extra digging and there may be indeed a problem with current Netty impl to guarantee no pings after Close. All other backends seem ok, because they send pings in the same stream. Even if merged from two parallel streams, that stream is ended with something like .append(WebSocketFrame.Close) so this should guarantee that Close is indeed the last response.
For Netty, however, we start scheduled Pings totally separately like this:

tapir/server/netty-server/src/main/scala/sttp/tapir/server/netty/internal/ws/WebSocketAutoPingHandler.scala

Line 35 in bc9b89f

 ctx.channel().eventLoop().scheduleAtFixedRate(sendPing, pingInterval.toMillis, pingInterval.toMillis, TimeUnit.MILLISECONDS) 

pingTask = ctx.channel().eventLoop().scheduleAtFixedRate(sendPing, 
  pingInterval.toMillis, pingInterval.toMillis, TimeUnit.MILLISECONDS)

and stop this in a handler:

  override def channelInactive(ctx: ChannelHandlerContext): Unit = {
    super.channelInactive(ctx)
    logger.debug(s"STOPPING WebSocket Ping scheduler for channel ${ctx.channel}")
    if (pingTask != null) {
      val _ = pingTask.cancel(false)
    }
  }

I think it's technically possible that after a reactive streams sends Close and before this channelInactive is triggered, the scheduled task may call sendPing. However, tests never failed on CI on netty backends for that particular reason, so I would register a low-priority issue to deal with this specific case.

kciesielski · 2024-08-16T09:09:28Z

Closing, because https://github.com/softwaremill/tapir/pull/3968/files pulls softwaremill/sttp-shared#425, which should resolve the issue on the client level.

Disable autoPing in WS tests

10e72b9

kciesielski marked this pull request as ready for review July 19, 2024 08:32

kciesielski requested a review from adamw July 19, 2024 08:32

kciesielski mentioned this pull request Jul 23, 2024

[WIP] Fix flaky tests #3827

Draft

kciesielski closed this Aug 16, 2024

kciesielski deleted the fix-ws-http4s-test branch August 16, 2024 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable autoPing in WS tests #3944

Disable autoPing in WS tests #3944

kciesielski commented Jul 19, 2024

adamw commented Jul 19, 2024

kciesielski commented Jul 19, 2024

kciesielski commented Jul 19, 2024

adamw commented Jul 21, 2024

kciesielski commented Jul 22, 2024

adamw commented Jul 22, 2024

kciesielski commented Jul 23, 2024

kciesielski commented Aug 16, 2024

Disable autoPing in WS tests #3944

Disable autoPing in WS tests #3944

Conversation

kciesielski commented Jul 19, 2024

adamw commented Jul 19, 2024

kciesielski commented Jul 19, 2024

kciesielski commented Jul 19, 2024

adamw commented Jul 21, 2024

kciesielski commented Jul 22, 2024

adamw commented Jul 22, 2024

kciesielski commented Jul 23, 2024

kciesielski commented Aug 16, 2024