Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for stale session issue #37

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

colinmcintosh
Copy link

Added logic to reset the client SessionID if the client disconnects and does not reconnect before the session timeout time elapses. make test passes for me with these changes.

Fixes #36.

@codecov
Copy link

codecov bot commented Jul 29, 2020

Codecov Report

Merging #37 (127b9eb) into master (50daf81) will decrease coverage by 0.85%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
- Coverage   77.58%   76.73%   -0.86%     
==========================================
  Files           7        7              
  Lines        1316     1199     -117     
==========================================
- Hits         1021      920     -101     
+ Misses        205      190      -15     
+ Partials       90       89       -1     
Impacted Files Coverage Δ
conn.go 74.30% <83.33%> (-0.31%) ⬇️
constants.go 75.00% <0.00%> (-6.25%) ⬇️
lock.go 63.38% <0.00%> (-4.12%) ⬇️
dnshostprovider.go 72.72% <0.00%> (-1.64%) ⬇️
structs.go 80.75% <0.00%> (-1.28%) ⬇️
flw.go 82.78% <0.00%> (-0.97%) ⬇️
util.go 93.61% <0.00%> (+0.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50daf81...127b9eb. Read the comment docs.

Copy link

@yarikk yarikk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @colinmcintosh, thanks for your submission. Do you think you could provide a test which exercises those code paths?

@pmazzini
Copy link

I am trying to understand what is this change fixing.

Doesn't the session get reset in here for you https://github.com/go-zookeeper/zk/blob/master/conn.go#L673?

@zmstone
Copy link

zmstone commented Aug 31, 2020

@pmazzini
judging by the EOF error (full line: authentication failed: EOF)
it's likely the code is returned from somewhere before L673
maybe here: https://github.com/go-zookeeper/zk/blob/master/conn.go#L663-L666

@nemith
Copy link

nemith commented Mar 26, 2021

Any way we can get a unit test on this?

@killerwhile
Copy link

If this can help someone, I got this authentication failed: EOF in a cluster running v3.5.8 and apparently in a state like described in https://issues.apache.org/jira/browse/ZOOKEEPER-3829 and https://issues.apache.org/jira/browse/ZOOKEEPER-3830. Restarting the nodes did help a lot.

@colinmcintosh
Copy link
Author

Hi! I pushed a couple updates to restrict this session reset further and moved the disconnect time to fix an issue with tight loops not being caught by the timer.

I did attempt to write a test for this that started, stopped, and started the test ZK server to reset the session but couldn't figure out how to make it work with the test server. I'm able to easily reproduce this by running Zookeeper in a Docker container and restarting the container (docker kill zookeeper && docker run --rm --name=zookeeper -d -p 2181:2181 zookeeper). I'd appreciate any help writing tests to copy that behavior; it would be great to get this PR merged in since it's been hanging around for a while.

…nd does not reconnect before the session timeout time elapses.
@jpfourny
Copy link

I will take a stab at a unit test to produce the issue when I find some magic 🦄 time. 🤣 No promises.

@@ -482,6 +490,7 @@ func (c *Conn) loop(ctx context.Context) {
}

c.setState(StateDisconnected)
disconnectTime = time.Now()
Copy link

@joeedt joeedt Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should place after line 489;

when zookeeper cluster re-elect, reconnect always success, but auth failed:(EOF). line 493 will rfresh disconnectTime every loop; line 442 always be False

@LittleCuteBug
Copy link

Hi @colinmcintosh, are you still working on this pull request. I'm facing similar issue and hoping this pull request will be done.

@covy-blue
Copy link

Hi, I am still experiencing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zookeeper client session is not reset when cluster loses session state (authentication failed: EOF)
10 participants