Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect on lost websocket connection #39

Merged
merged 6 commits into from
May 24, 2020
Merged

Reconnect on lost websocket connection #39

merged 6 commits into from
May 24, 2020

Conversation

farao
Copy link
Member

@farao farao commented May 21, 2020

When the websocket connection failed after having been established before,
the complete rtc session is tried to reinitialize.

This needs an update on the palava client first: palavatv/palava-client#16

@farao
Copy link
Member Author

farao commented May 21, 2020

Two things to discuss:

  1. is it possible to get rid of the race condition that between the if(navigator.online) and the add event listener the status could go from offline to online without the reconnect happening? maybe with a promise or so?
  2. the check for navigator.online is bad for trying out locally I guess - but there the connection should not get lost in the first place, right?

@@ -48,6 +48,7 @@ export default {
peers: [],
localPeer: null,
infoPage: null,
signalingConnectedBefore = false,
Copy link
Member

@janlelis janlelis May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax error :/

Suggested change
signalingConnectedBefore = false,
signalingConnectedBefore: false,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not that familiar with javascript. But could we apply some sanitizer/compiler to our javascript files. Such that these kind of syntax errors could be detected in CI? I could setup github actions for this repository.

Copy link
Member

@janlelis janlelis May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikzenker It already happens and it would be super duper to have such a github action. The command is $ yarn lint

@@ -202,6 +203,18 @@ export default {
},
})
},
reconnectRtc() {
if (this.signalingConnectedBefore) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is never flipped

reconnectRtc() {
if (this.signalingConnectedBefore) {
// TODO: show "Palava server not reachable" or "Network not reachable" overlay
if (navigator.online) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't known about navigator.online before - good idea!

if (this.signalingConnectedBefore) {
// TODO: show "Palava server not reachable" or "Network not reachable" overlay
if (navigator.online) {
rtc.reconnect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we are guarding against being offline, the network can be bad (or the server could be down), so there should probably be a timeout functionality here. Maybe also a retry count (or an "endless" interval with larger timeout)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what happens after the retry count? The person gets kicked out? I think as long as somebody wants to participate in the discussion, it should be continously tried to reconnect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. One way would be to then show a network error page with a "reconnect" button. But the endless-interval with reasonable timeout in between tries sounds fine too me too.

@janlelis
Copy link
Member

janlelis commented May 22, 2020

1. is it possible to get rid of the race condition that between the if(navigator.online) and the add event listener the status could go from offline to online without the reconnect happening? maybe with a promise or so?

Can you please provide an example of the race condition, it's not yet clear to me

@janlelis
Copy link
Member

2\. the check for navigator.online is bad for trying out locally I guess - but there the connection should not get lost in the first place, right?

The Chrome devtools have an "offline" flag in the network tab - this should trigger this event (maybe not, but I think it does)

if (navigator.online) {
rtc.reconnect()
} else {
window.addEventListener('online', function() {rtc.reconnect()});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event listener needs to be unregistered, when fired, or they will "stack up" with every new error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole online/offline state thing could also be considered an extra feature (and treated separately from connection errors)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some point we could think of introducing some kind of state machine. Otherwise it will get hard to keep the overview.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as the websocket connection stands, everything is ok - and when it fails, this is only a safeguard not to try it again until it could actually work :-) I don't see a need to treat this separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a state-machine (probably even the client library) - which brings together signaling, peer connection, and network/online status

@farao
Copy link
Member Author

farao commented May 22, 2020

1. is it possible to get rid of the race condition that between the if(navigator.online) and the add event listener the status could go from offline to online without the reconnect happening? maybe with a promise or so?

Can you please provide an example of the race condition, it's not yet clear to me

Not quite sure, if javascript prevents this to happen, but what I imagine could happen:

  1. "if (navigator.online) {" => false (offline)
  2. navigator.online becomes true (online)
  3. online event listener is registered but will not be called since the status changed already to online between the if and registering the listener

How about this as a solution?:

  1. register the online listener
  2. if navigator.online, send the online event manually (does this work?)

@janlelis
Copy link
Member

Rebased + pushed to staging!

@janlelis
Copy link
Member

Not quite sure, if javascript prevents this to happen, but what I imagine could happen:

I think we are safe here

@farao
Copy link
Member Author

farao commented May 22, 2020

Not quite sure, if javascript prevents this to happen, but what I imagine could happen:

I think we are safe here

Unfortunately no:

<html>
<script>
    function sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
    async function test() {
        if(navigator.online) console.log("online")
        await sleep(20000)
        console.log("register listener");
        window.addEventListener("online", function() { console.log("went online"); });
    }
    console.log("start");
    test()
</script>
</html>
  1. Disconnect from the internet
  2. After seeing start, enable your internet again (quick! :D)
  3. Note that "online" has not been printed
  4. "went online" also never gets printed even though you went online again in the meantime

@farao
Copy link
Member Author

farao commented May 22, 2020

Is it possible to emit the online event on window manually?

@farao
Copy link
Member Author

farao commented May 22, 2020

Fixed it :-)

@farao
Copy link
Member Author

farao commented May 22, 2020

Ok to merge or shall we do something about my original concern nr. 2 (local testing)?
And what about the TODO line (showing some information that the network (or more concretely the palava server) is not reachable?

@erikzenker
Copy link
Member

erikzenker commented May 23, 2020

Ok to merge or shall we do something about my original concern nr. 2 (local testing)?

I think going forward we should invest some energy into automatic testing. Javascript should have
decent mocking capabilities so I guess it will not get to complex.

And what about the TODO line (showing some information that the network (or more concretely the palava server) is not reachable?

What about creating a new issue and doing it in a followup? By that you keep the focus in this PR on the reconnect feature.

@janlelis
Copy link
Member

@farao I would say we always put the latest version of this PR on staging for simpler real-worold testing.

  • Original concern nr.2: The chrome devtools let you switch your browser to offline under the network tab. They seem to have removed the handy checkbox, but you can still achieve this by creating a "throttle" configuration which sends 0 kbits. I have verified that this flips the onLine state (on Chrome)
  • I have created an extra issue for displaying network problems to the user in Display "reconnecting" indicator in UX when network not reachable #44
  • I still think relying soley on the onLine state is not enough, since (a) browser implementation varies https://developer.mozilla.org/en-US/docs/Web/API/NavigatorOnLine/onLine and (b) the network is generally unreliable so that it is not a super rare edge case that the browser is connected to the internet, but the palava server cannot be reached and (c) the palva server can be down. Having said this, I love this idea of using the online state and it should remain the main mechanism.
  • The secondary mechanism should be an interval based one: If browser state is onLine and connection did not work, retry every X seconds (maybe even with increasing intervals). The current version would continuously attempt to connect

@janlelis
Copy link
Member

janlelis commented May 23, 2020

I think going forward we should invest some energy into automatic testing. Javascript should have
decent mocking capabilities so I guess it will not get to complex.

Absolutely, especially end-to-end (browser) tests (opened an issue for that #45). Also, there is already some boilerplate test setup in the project here: https://github.com/palavatv/palava-web/blob/master/tests/e2e/specs/test.js

@farao
Copy link
Member Author

farao commented May 23, 2020

* Original concern nr.2: The chrome devtools let you switch your browser to offline under the network tab. They seem to have removed the handy checkbox, but you can still achieve this by creating a "throttle" configuration which sends 0 kbits. I have verified that this flips the `onLine` state (on Chrome)

Ok, so we just leave it this way I guess!

* I have created an extra issue for displaying network problems to the user in #44

👌

* I still think relying soley on the `onLine` state is not enough, since (a) browser implementation varies https://developer.mozilla.org/en-US/docs/Web/API/NavigatorOnLine/onLine and (b) the network is generally unreliable so that it is not a super rare edge case that the browser is connected to the internet, but the palava server cannot be reached and (c) the palva server can be down. Having said this, I love this idea of using the online state and it should remain the main mechanism.

Super strange that the browsers implement this so differently, but apparently they at least have in common that offline always means offline. So I would think this is good enough, in chrome we will then simply keep on trying to reestablish the websocket connection.

* The secondary mechanism should be an interval based one: If browser state is `onLine` and connection did not work, retry every X seconds (maybe even with increasing intervals). The current version would continuously attempt to connect

Yes, this makes sense!

@janlelis
Copy link
Member

janlelis commented May 24, 2020

Looks good! Some smallish things before we can finally merge:

  • Please rebase to latest master
  • Typo repeatet -> repeated
  • Could you move the actual timeout value to config.js?

@janlelis janlelis merged commit f9a64e6 into master May 24, 2020
@janlelis
Copy link
Member

🎉 --> staging

@janlelis janlelis deleted the reconnect branch June 1, 2020 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants