Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection lost but client returns that it is still connected #12

Open
sebirdman opened this issue Dec 21, 2017 · 7 comments · May be fixed by #13
Open

Connection lost but client returns that it is still connected #12

sebirdman opened this issue Dec 21, 2017 · 7 comments · May be fixed by #13

Comments

@sebirdman
Copy link

sebirdman commented Dec 21, 2017

After putting my application in the background for a minute (on android) and resuming i notice that the Client.isConnected() call returns true, even when the trace ping logs are not showing responses. (and my other clients do not receive anything sent from the disconnected client)

This problem seems to be tied to this: 8185b64

  1. _socketSend (the only place our pinger seems to be reset) is never called (as far as i can see)
  2. our pinger will never "fail" which i think breaks the "keep alive" function of mqtt outlined here: https://www.hivemq.com/blog/mqtt-essentials-part-10-alive-client-take-over
  3. if i set the keep alive time to a shorter time (default for this library is 60 seconds - amazon iot says the shortest they allow is 5 seconds) then this client will be disconnected after that time without properly reporting that it is disconnected.
@sebirdman
Copy link
Author

I've created a PR that starts to fix this problem. After a ping request fails to receive a response, it'll disconnect. The only issue is that this is n keepAlive seconds after the application has been resumed, ideally we would send some kind of ping request out on resume (or just consider ourselves disconnected if we know we've missed a ping request, since this is what the mqtt spec seems to say will happen)

@robhogan
Copy link
Owner

Thanks for looking into this. IIRC, I removed this functionality originally because it was a bit off-spec (using a second timer to ping only when we hadn't heard from the server) and conflated keepalive with response timeout (as you point out). Normally you'd expect a ping response within a second or two, whereas the keepalive interval is often several minutes, so using the same value for both doesn't sit right.

Technically the current implementation is spec compliant, since the spec only requires the server to issue a disconnection if it hasn't heard from the client - it doesn't place any firm obligation on the client (it only makes a fairly vague recommendation):

If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed [MQTT-3.1.2-24].

If a Client does not receive a PINGRESP Packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.`

_socketSend (the only place our pinger seems to be reset) is never called (as far as i can see)

Good catch - that function is redundant and should be removed. The pinger is reset elsewhere though - in process_queue.

if i set the keep alive time to a shorter time (default for this library is 60 seconds - amazon iot says the shortest they allow is 5 seconds) then this client will be disconnected after that time without properly reporting that it is disconnected.

I'm not sure what you mean by this. The keepalive interval is specified by the client in the connect request, and the server should respect that, so there shouldn't be a mismatch of expectations.

@sebirdman
Copy link
Author

It does seem off spec to have the second timer pinging only when we haven't heard from the server. Not sure what the proper solution to this really should be then (as you point out in the merge request, there might be cases where the pinger being reset causes a disconnect from the server)

The pinger is reset elsewhere though - in process_queue.

ahhh there it is!

I'm not sure what you mean by this. The keepalive interval is specified by the client in the connect request, and the server should respect that, so there shouldn't be a mismatch of expectations.

What I mean't by this is that it seems that the server is disconnecting us after the keepalive time, as it should, but that the pinger wasn't alerting us to the disconnect after the keepalive time is up.

Maybe a good solution would be to hold onto the last time the pinger ran, check to see before we ping if it's > keepalive time and then assume that the server has issued a disconnect?

eg:

lastPingTime = 1513972970
currentTime = 1513977970

if (currentTime - lastAliveTime > keepAliveTime) disconnect()

@robhogan
Copy link
Owner

Incidentally, are you actually seeing unnoticed disconnections in practice? I would've thought the underlying websocket layer would pick up disconnections regardless of what we're doing, and that should trigger a call of _disconnected (assuming the socket isn't being held open by a reverse proxy after the MQTT server has died, I guess)

@sebirdman
Copy link
Author

I'm seeing unnoticed disconnections when the application goes into the background for any time > keepAlive on Android 8.0. (Might be other android versions as well as iOS but i haven't checked yet)

Looks like you're right, QoS 0 requires no ACK. With QoS > 0 an ACK is expected. If we don't receive this ACK what are we supposed to do?

@robhogan
Copy link
Owner

Well, MQTT says we do nothing, the message is just silently lost. QoS=0 means that's not a problem to you.

We don't want that to be happening when we could be connected though, and we don't want to have our subscriptions interrupted without us knowing about it. My first thought was to inspect socket.readyState if socket.onclose isn't being called reliably, but I want to be careful about conflating connections at the socket layer with the those at the MQTT layer. Perhaps pinging the server is the way to go... I'll give it some thought.

@MattyK14
Copy link

The disconnections on Android 8.0 likely have something to do with this? But then what causes the client to not see a disconnection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants