Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random test failure: stress: infinite wait for preflist to converge #86

Open
eric opened this issue Jul 18, 2014 · 4 comments
Open

Random test failure: stress: infinite wait for preflist to converge #86

eric opened this issue Jul 18, 2014 · 4 comments

Comments

@eric
Copy link
Contributor

eric commented Jul 18, 2014

I've seen this hang on travis-ci and just witnessed it locally where I was able to send a SIGQUIT to the process and get the stack trace.

The tests are hung at stress_test.clj:100:

"main" prio=5 tid=0x00007fadf1000800 nid=0x1b03 waiting on condition [0x0000000101998000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at skuld.stress_test$fn__9147.invoke(stress_test.clj:100)
    at clojure.test$test_var$fn__7145.invoke(test.clj:701)
    at clojure.test$test_var.invoke(test.clj:701)
    at clojure.test$test_all_vars$fn__7149$fn__7156.invoke(test.clj:717)
    at skuld.node_test$each.invoke(node_test.clj:131)
    at clojure.test$compose_fixtures$fn__7139$fn__7140.invoke(test.clj:678)
    at clojure.test$default_fixture.invoke(test.clj:671)
    at clojure.test$compose_fixtures$fn__7139.invoke(test.clj:678)
    at clojure.test$test_all_vars$fn__7149.invoke(test.clj:717)
    at skuld.node_test$once.invoke(node_test.clj:118)
    at clojure.test$compose_fixtures$fn__7139$fn__7140.invoke(test.clj:678)
    at clojure.test$default_fixture.invoke(test.clj:671)
    at clojure.test$compose_fixtures$fn__7139.invoke(test.clj:678)
    at clojure.test$test_all_vars.invoke(test.clj:713)
    at clojure.test$test_ns.invoke(test.clj:736)
    at clojure.core$map$fn__4207.invoke(core.clj:2487)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    - locked <0x00000007e86b8058> (a clojure.lang.LazySeq)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    - locked <0x00000007e86b8058> (a clojure.lang.LazySeq)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core$reduce1.invoke(core.clj:896)
    at clojure.core$reduce1.invoke(core.clj:887)
    at clojure.core$merge_with.doInvoke(core.clj:2702)
    at clojure.lang.RestFn.applyTo(RestFn.java:139)
    at clojure.core$apply.invoke(core.clj:619)
    at clojure.test$run_tests.doInvoke(test.clj:751)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clojure.core$apply.invoke(core.clj:617)
    at user$eval87$fn__154$fn__189.invoke(form-init2815011225954042819.clj:1)
    at user$eval87$fn__154$fn__155.invoke(form-init2815011225954042819.clj:1)
    at user$eval87$fn__154.invoke(form-init2815011225954042819.clj:1)
    at user$eval87.invoke(form-init2815011225954042819.clj:1)
    at clojure.lang.Compiler.eval(Compiler.java:6619)
    at clojure.lang.Compiler.eval(Compiler.java:6609)
    at clojure.lang.Compiler.load(Compiler.java:7064)
    at clojure.lang.Compiler.loadFile(Compiler.java:7020)
    at clojure.main$load_script.invoke(main.clj:294)
    at clojure.main$init_opt.invoke(main.clj:299)
    at clojure.main$initialize.invoke(main.clj:327)
    at clojure.main$null_opt.invoke(main.clj:362)
    at clojure.main$main.doInvoke(main.clj:440)
    at clojure.lang.RestFn.invoke(RestFn.java:421)
    at clojure.lang.Var.invoke(Var.java:419)
    at clojure.lang.AFn.applyToHelper(AFn.java:163)
    at clojure.lang.Var.applyTo(Var.java:532)
    at clojure.main.main(main.java:37)

Here's the log of the test run:

lein test skuld.stress-test
2014-07-17T17:02:08.213 INFO  skuld.node: 13003 skuld_0 coming online
2014-07-17T17:02:08.227 INFO  skuld.node: 13001 skuld_1 coming online
2014-07-17T17:02:08.240 INFO  skuld.node: 13004 skuld_1 coming online
2014-07-17T17:02:08.351 INFO  skuld.http: Starting HTTP server on 127.0.0.1:13100
2014-07-17T17:02:08.351 INFO  org.eclipse.jetty.server.Server: jetty-7.6.8.v20121106
2014-07-17T17:02:08.355 INFO  org.eclipse.jetty.server.AbstractConnector: Started [email protected]:13100
2014-07-17T17:02:08.362 INFO  skuld.node: 13000 spooling up zombie vnode skuld_0
2014-07-17T17:02:08.362 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13000} skuld_0
2014-07-17T17:02:08.369 INFO  skuld.http: Starting HTTP server on 127.0.0.1:13101
2014-07-17T17:02:08.369 INFO  org.eclipse.jetty.server.Server: jetty-7.6.8.v20121106
2014-07-17T17:02:08.369 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13001} skuld_1
2014-07-17T17:02:08.376 INFO  skuld.node: 13001 skuld_0 coming online
2014-07-17T17:02:08.377 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13001} skuld_0
2014-07-17T17:02:08.379 INFO  skuld.vnode: {:host 127.0.0.1, :port 13000} skuld_0 now zombie
2014-07-17T17:02:08.380 INFO  skuld.node: 13000 spooling up zombie vnode skuld_1
2014-07-17T17:02:08.380 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13000} skuld_1
2014-07-17T17:02:08.380 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13003} skuld_0
2014-07-17T17:02:08.381 INFO  skuld.http: Starting HTTP server on 127.0.0.1:13103
2014-07-17T17:02:08.382 INFO  org.eclipse.jetty.server.Server: jetty-7.6.8.v20121106
2014-07-17T17:02:08.384 INFO  org.eclipse.jetty.server.AbstractConnector: Started [email protected]:13101
2014-07-17T17:02:08.388 INFO  skuld.node: 13003 skuld_1 coming online
2014-07-17T17:02:08.389 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13003} skuld_1
2014-07-17T17:02:08.389 INFO  org.eclipse.jetty.server.AbstractConnector: Started [email protected]:13103
2014-07-17T17:02:08.393 INFO  skuld.vnode: {:host 127.0.0.1, :port 13000} skuld_1 now zombie
2014-07-17T17:02:08.430 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13004} skuld_1
2014-07-17T17:02:08.431 INFO  skuld.http: Starting HTTP server on 127.0.0.1:13104
2014-07-17T17:02:08.431 INFO  org.eclipse.jetty.server.Server: jetty-7.6.8.v20121106
2014-07-17T17:02:08.443 INFO  skuld.node: 13004 skuld_0 coming online
2014-07-17T17:02:08.445 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13004} skuld_0
2014-07-17T17:02:08.451 INFO  org.eclipse.jetty.server.AbstractConnector: Started [email protected]:13104
2014-07-17T17:02:08.552 INFO  skuld.node: 13003 skuld_0 going offline
2014-07-17T17:02:08.553 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} skuld_0 now zombie
2014-07-17T17:02:08.636 INFO  skuld.node: 13003 skuld_0 dropped
2014-07-17T17:02:08.645 INFO  skuld.node: 13004 skuld_0 going offline
2014-07-17T17:02:08.645 INFO  skuld.vnode: {:host 127.0.0.1, :port 13004} skuld_0 now zombie
2014-07-17T17:02:08.669 INFO  skuld.node: 13000 skuld_0 coming online
2014-07-17T17:02:08.672 INFO  skuld.vnode: {:host 127.0.0.1, :port 13000} skuld_0 revived!
2014-07-17T17:02:08.719 INFO  skuld.node: 13004 skuld_0 dropped
2014-07-17T17:02:09.551 INFO  skuld.node: 13002 skuld_0 coming online
2014-07-17T17:02:09.631 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13002} skuld_0
2014-07-17T17:02:09.633 INFO  skuld.http: Starting HTTP server on 127.0.0.1:13102
2014-07-17T17:02:09.634 INFO  org.eclipse.jetty.server.Server: jetty-7.6.8.v20121106
2014-07-17T17:02:09.640 INFO  org.eclipse.jetty.server.AbstractConnector: Started [email protected]:13102
2014-07-17T17:02:09.649 INFO  skuld.node: 13002 spooling up zombie vnode skuld_1
2014-07-17T17:02:09.650 INFO  skuld.vnode: starting vnode {:host 127.0.0.1, :port 13002} skuld_1
2014-07-17T17:02:09.672 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} skuld_1 now zombie
2014-07-17T17:02:13.749 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} skuld_0 initiating election
2014-07-17T17:02:13.749 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} node is active
2014-07-17T17:02:13.750 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} leader is outdated
2014-07-17T17:02:13.754 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} :old #{} :new #{{:port 13001, :host 127.0.0.1} {:port 13000, :host 127.0.0.1} {:port 13002, :host 127.0.0.1}}
2014-07-17T17:02:13.758 INFO  skuld.vnode: {:host 127.0.0.1, :port 13001} skuld_0 assuming epoch 1
2014-07-17T17:02:13.760 INFO  skuld.vnode: {:host 127.0.0.1, :port 13000} skuld_0 assuming epoch 1
2014-07-17T17:02:13.763 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} Received enough votes: ({:request-id #<byte[] [B@148b2aba>, :vote {:host 127.0.0.1, :port 13000}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13000, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13002, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13002}, :updated true})
2014-07-17T17:02:13.764 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} Received enough votes: ({:request-id #<byte[] [B@3ca223df>, :vote {:host 127.0.0.1, :port 13001}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13000, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13002, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13002}, :updated true} {:request-id #<byte[] [B@148b2aba>, :vote {:host 127.0.0.1, :port 13000}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13000, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13002, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13002}, :updated true})
2014-07-17T17:02:13.773 INFO  skuld.vnode: {:host 127.0.0.1, :port 13002} skuld_0 election successful: cohort now 1 #{{:port 13001, :host 127.0.0.1} {:port 13000, :host 127.0.0.1} {:port 13002, :host 127.0.0.1}}
2014-07-17T17:02:13.774 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} skuld_1 initiating election
2014-07-17T17:02:13.774 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} node is active
2014-07-17T17:02:13.774 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} leader is outdated
2014-07-17T17:02:13.777 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} :old #{} :new #{{:port 13001, :host 127.0.0.1} {:port 13003, :host 127.0.0.1} {:port 13004, :host 127.0.0.1}}
2014-07-17T17:02:13.786 INFO  skuld.vnode: {:host 127.0.0.1, :port 13001} skuld_1 assuming epoch 1
2014-07-17T17:02:13.789 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} Received enough votes: ({:request-id #<byte[] [B@5d5df07c>, :vote {:host 127.0.0.1, :port 13001}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13003, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13004, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13003}, :updated true})
2014-07-17T17:02:13.790 INFO  skuld.vnode: {:host 127.0.0.1, :port 13004} skuld_1 assuming epoch 1
2014-07-17T17:02:13.793 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} Received enough votes: ({:request-id #<byte[] [B@42578cc7>, :vote {:host 127.0.0.1, :port 13004}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13003, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13004, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13003}, :updated true} {:request-id #<byte[] [B@5d5df07c>, :vote {:host 127.0.0.1, :port 13001}, :type :follower, :epoch 1, :cohort #<HashSet [{:port 13003, :host 127.0.0.1}, {:port 13001, :host 127.0.0.1}, {:port 13004, :host 127.0.0.1}]>, :leader {:host 127.0.0.1, :port 13003}, :updated true})
2014-07-17T17:02:13.800 INFO  skuld.vnode: {:host 127.0.0.1, :port 13003} skuld_1 election successful: cohort now 1 #{{:port 13001, :host 127.0.0.1} {:port 13003, :host 127.0.0.1} {:port 13004, :host 127.0.0.1}}
@eric
Copy link
Contributor Author

eric commented Jul 18, 2014

Here is the entire stack trace if anyone is interested in looking further: https://gist.github.com/eric/10d1f2b8b863fc8cfa60

@eric
Copy link
Contributor Author

eric commented Jul 20, 2014

I've added a timeout for the waiting in #92 so it is no longer infinite.

@eric
Copy link
Contributor Author

eric commented Jul 20, 2014

...that doesn't actually address why it didn't ever converge, but it will make the tests fail in a reasonable amount of time.

@eric
Copy link
Contributor Author

eric commented Jul 25, 2014

Since this now times out, we've still had an instance of hitting the timeout: https://travis-ci.org/Factual/skuld/jobs/30803536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant