Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IT's error out when calling mvn verify #971

Open
cjmctague opened this issue Nov 20, 2017 · 36 comments
Open

IT's error out when calling mvn verify #971

cjmctague opened this issue Nov 20, 2017 · 36 comments

Comments

@cjmctague
Copy link
Contributor

I'm trying to vote in the parent-pom release and I can't get mvn verify to run properly even before checking it against the new parent-pom

It appears to be a bug with accessing zookeeper as all the IT's error out with

java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2

I've tried to rebuild the project but it did not help. I'm sure it's something wrong with my project as @keith-turner and @ctubbsii have been able to successfully test it recently but I can't figure out the problem.

Note this error came before trying to test the new Parent-pom.

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.fluo.integration.impl.ZKSecretIT
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.198 sec - in org.apache.fluo.integration.impl.ZKSecretIT
Running org.apache.fluo.integration.impl.SelfNotificationIT
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 1.161 sec <<< FAILURE! - in org.apache.fluo.integration.impl.SelfNotificationIT
test1(org.apache.fluo.integration.impl.SelfNotificationIT)  Time elapsed: 1.142 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2

test1(org.apache.fluo.integration.impl.SelfNotificationIT)  Time elapsed: 1.142 sec  <<< ERROR!
java.lang.RuntimeException: java.lang.IllegalStateException: tracker already closed
Caused by: java.lang.IllegalStateException: tracker already closed

Running org.apache.fluo.integration.impl.ParallelScannerIT
Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 1.158 sec <<< FAILURE! - in org.apache.fluo.integration.impl.ParallelScannerIT
testParallelScanRecovery1(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 0.009 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3

testParallelScanRecovery1(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 0.009 sec  <<< ERROR!
java.lang.NullPointerException

testParallelScanRecovery2(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 1.125 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4

....
Results :

Tests in error: 
org.apache.fluo.integration.client.FluoAdminImplIT.testInitializeConfig(org.apache.fluo.integration.client.FluoAdminImplIT)
  Run 1: FluoAdminImplIT>ITBaseImpl.setUpFluo:85 » Runtime org.apache.zookeeper.KeeperE...
  Run 2: FluoAdminImplIT>ITBaseImpl.tearDownFluo:106 » NullPointer

org.apache.fluo.integration.client.FluoAdminImplIT.testInitializeLongChroot(org.apache.fluo.integration.client.FluoAdminImplIT)
  Run 1: FluoAdminImplIT>ITBaseImpl.setUpFluo:85 » Runtime org.apache.zookeeper.KeeperE...
  Run 2: FluoAdminImplIT>ITBaseImpl.tearDownFluo:106 » NullPointer

org.apache.fluo.integration.client.FluoAdminImplIT.testInitializeTwiceFails(org.apache.fluo.integration.client.FluoAdminImplIT)
  Run 1: FluoAdminImplIT>ITBaseImpl.setUpFluo:85 » Runtime org.apache.zookeeper.KeeperE...
  Run 2: FluoAdminImplIT>ITBaseImpl.tearDownFluo:106 » NullPointer

org.apache.fluo.integration.client.FluoAdminImplIT.testInitializeWithNoChroot(org.apache.fluo.integration.client.FluoAdminImplIT)
  Run 1: FluoAdminImplIT>ITBaseImpl.setUpFluo:85 » Runtime org.apache.zookeeper.KeeperE...
  Run 2: FluoAdminImplIT>ITBaseImpl.tearDownFluo:106 » NullPointer

.....
@ctubbsii
Copy link
Member

Did you try doing a mvn clean first? I always run mvn clean verify in case any stuff in the target directories isn't properly reset for the next build. Also check that you don't have any old maven artifacts for Fluo in your local maven repository which might be getting on the class path (rm -rf ~/.m2/repository/org/apache/fluo; path may differ if your local repository is located elsewhere).

Another thing that can cause problems is if Eclipse or another IDE is open and building your project while trying to do a build from the command-line.

@ctubbsii
Copy link
Member

Also, the test in error seems to be FluoAdminImplIT. There were a few recent changes with that to fix some things. Make sure your workspace is up to date. You can check with git remote update && git branch -vv.

@cjmctague
Copy link
Contributor Author

@ctubbsii
Its getting the error in all the IT's. I just didn't copy them all as it's a long list. Those are the first ones and all the rest display the same error after.

I tried mvn clean verify to no prevail.
I cleaned out ~/.m2/repository/org/apache/fluo than ran mvn package and mvn clean verify and still got the same errors.
Eclipse is closed and I'm strictly using the command line.

@ctubbsii
Copy link
Member

Ah, right, I should have paid more attention to the initial post. Have you tried checking to see if there's an old ZK running that won't die (ps aux | grep jav[a] or jps -ml)? Perhaps doing a reboot just to be absolutely sure?

@cjmctague
Copy link
Contributor Author

ps aux | grep jav[a] returns nothing

jps -ml
1023 sun.tools.jps.Jps -ml

Rebooted and still getting the same list of errors.

@ctubbsii
Copy link
Member

Hmm. This is a tricky one and I'm running low on ideas. Just to throw some more ideas out there to try:

You could try clearing out all of your local maven repository, in case one of the dependencies are corrupt. You could ensure that it is configured to use https for Maven Central and a strict checksum policy to avoid resolving corrupt dependencies when they are re-downloaded.

You could also check to ensure that you don't have any strange items on your class path either in your environment (env | egrep '(CLASSPATH|JAVA|MAVEN)', or any unusual properties or profiles in your ~/.m2/settings.xml file, or any weird items in your ~/.mavenrc file (if present).

You could try cloning the git repo from scratch to a different location, and running there, to ensure there's not anything wrong with your local workspace.

You could also check any logs in the target directories, for a possible hint, but I'm not really sure what to look for.

You could run sudo netstat -tlnp to see what other processes might be listening on your TCP ports, which could prevent ZooKeeper from starting properly.

@keith-turner
Copy link
Contributor

Are there any stack trace lines after the messages like Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2?

I ask because I am curious if this has anything to do with the changes in #899. Each node in ZK can have an auth set. Accumulo sets auths in ZK and Fluo does optionally with the changes in #899. The paths in the error messages /fluo/impl-test4 lead me to believe its executing Fluo code when this happens. It would be nice to know what specific code is executing.

If you are not seeing the stack traces, then its possible that adding the -X option will show the stack traces. However this can generate a huge amount out output, so you may also want to use the -l <logfile> option. Could run something like the following.

mvn -X -l /tmp/fluo-build.log clean verify
# search for NoAuth
less /tmp/fluo-build.log

@cjmctague
Copy link
Contributor Author

cjmctague commented Nov 20, 2017

@ctubbsii
No luck with any of that sadly. Be right back gonna toss my laptop off the roof.

@keith-turner
Is this what you were looking for? I don't see any trace when looking at the file manually. I can email you the log file if it you want to take a look.

 grep NoAuth /tmp/fluo-build.log 
[DEBUG] Found import: import org.apache.zookeeper.KeeperException.NoAuthException;
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4

(EDIT: extra lines truncated by @cltubbs for easier scrolling; it was just repeats of the same error above)

@keith-turner
Copy link
Contributor

Try grep -A 10 "NoAuth for" /tmp/fluo-build.log | head -20

@keith-turner
Copy link
Contributor

@cjmctague yeah you can email me the log. If you can bzip2 or gzip it. bzip will probably compress better.

@cjmctague
Copy link
Contributor Author

@keith-turner

grep -A 10 "NoAuth for" /tmp/fluo-build.log | head -20
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/mini-test2

test1(org.apache.fluo.integration.impl.SelfNotificationIT)  Time elapsed: 1.131 sec  <<< ERROR!
java.lang.RuntimeException: java.lang.IllegalStateException: tracker already closed
Caused by: java.lang.IllegalStateException: tracker already closed

Running org.apache.fluo.integration.impl.ParallelScannerIT
Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 0.043 sec <<< FAILURE! - in org.apache.fluo.integration.impl.ParallelScannerIT
testParallelScanRecovery1(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 0.01 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test3

testParallelScanRecovery1(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 0.01 sec  <<< ERROR!
java.lang.NullPointerException

testParallelScanRecovery2(org.apache.fluo.integration.impl.ParallelScannerIT)  Time elapsed: 0.009 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test4

@keith-turner
Copy link
Contributor

Ugh, so -X did not show the stack traces. Maybe the -e option will. Could try mvn -X -e -l /tmp/fluo-build.log

Also there may be info of interest in modules/integration/target/failsafe-reports after running the build.

Make sure to get a video of the laptop toss, could be a hit on /r/osha or /r/wcgw

@keith-turner
Copy link
Contributor

Something else to look at is modules/integration/target/accumulo-maven-plugin/it-instance-maven/logs/Zoo*. These files are for the Zookeeper server that started up for the integration test.

@ctubbsii
Copy link
Member

Another thing you could try (very much doubt this is the issue) is disabling your firewall and/or selinux (if that applies to you). The problem could also be your host name. Check the host name in any zoo.cfg and/or accumulo-site.xml files in your target directory.

@cjmctague
Copy link
Contributor Author

@keith-turner
The -e flag didn't change anything. Still not finding any trace in the log file.
In the failsafe-reports it has all the same information as the log like such:

-------------------------------------------------------------------------------
Test set: org.apache.fluo.integration.impl.ReadLockIT
-------------------------------------------------------------------------------
Tests run: 24, Failures: 0, Errors: 24, Skipped: 0, Time elapsed: 5.806 sec <<< FAILURE! - in org.apache.fluo.integration.impl.ReadLockIT
testWriteAfterReadLock(org.apache.fluo.integration.impl.ReadLockIT)  Time elapsed: 0.009 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test7
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test7

testWriteAfterReadLock(org.apache.fluo.integration.impl.ReadLockIT)  Time elapsed: 0.01 sec  <<< ERROR!
java.lang.NullPointerException

testOnlyReadLocks(org.apache.fluo.integration.impl.ReadLockIT)  Time elapsed: 1.17 sec  <<< ERROR!
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test8
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /fluo/impl-test8

testOnlyReadLocks(org.apache.fluo.integration.impl.ReadLockIT)  Time elapsed: 1.17 sec  <<< ERROR!
java.lang.NullPointerException

There are 30 .txt files in those logs for all the various IT's failing.

@cjmctague
Copy link
Contributor Author

@ctubbsii
On a Mac and I disabled to firewall and nothing changed.

@keith-turner
Copy link
Contributor

@cjmctague what about the zookeeper server logs, anything there?

@cjmctague
Copy link
Contributor Author

cjmctague commented Nov 21, 2017

@keith-turner
Sorry I forgot about that. There is a lot more to it but this is the gist of it from what I can tell

2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:java.library.path=:/Users/*OMIT*/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:java.io.tmpdir=/var/folders/0d/bgnqjj8d503f19qzqwpcrg5m0000gn/T/
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:java.compiler=<NA>
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:os.name=Mac OS X
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:os.arch=x86_64
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:os.version=10.1*.*
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:user.name=*OMIT*
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:user.home=/Users/*OMIT*
2017-11-20 20:42:09,304 [server.ZooKeeperServer] INFO : Server environment:user.dir=/Users/*OMIT*/Desktop/Apache_Workspace/fluo
2017-11-20 20:42:09,312 [server.ZooKeeperServer] INFO : tickTime set to 2000
2017-11-20 20:42:09,312 [server.ZooKeeperServer] INFO : minSessionTimeout set to -1
2017-11-20 20:42:09,312 [server.ZooKeeperServer] INFO : maxSessionTimeout set to -1
2017-11-20 20:42:09,343 [server.NIOServerCnxnFactory] INFO : binding to port /127.0.0.1:60463
2017-11-20 20:42:09,416 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:60467
2017-11-20 20:42:09,471 [server.NIOServerCnxn] INFO : Processing ruok command from /127.0.0.1:60467
2017-11-20 20:42:09,474 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:60467 (no session established for client)
2017-11-20 20:42:10,876 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:60468
2017-11-20 20:42:10,887 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:60468
2017-11-20 20:42:10,890 [persistence.FileTxnLog] INFO : Creating new log file: log.1
2017-11-20 20:42:10,899 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad30000 with negotiated timeout 30000 for client /127.0.0.1:60468
2017-11-20 20:42:10,962 [server.ZooKeeperServer] INFO : got auth packet /127.0.0.1:60468
2017-11-20 20:42:10,975 [server.ZooKeeperServer] INFO : auth success /127.0.0.1:60468
2017-11-20 20:44:08,947 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:61050
2017-11-20 20:44:08,948 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:61050
2017-11-20 20:44:08,948 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad3008f with negotiated timeout 30000 for client /127.0.0.1:61050
2017-11-20 20:44:08,951 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad3008f type:create cxid:0x3 zxid:0x7da txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
2017-11-20 20:44:08,951 [server.PrepRequestProcessor] INFO : Processed session termination for sessionid: 0x15fdc3d5ad3008f
2017-11-20 20:44:08,952 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:61050 which had sessionid 0x15fdc3d5ad3008f
2017-11-20 20:44:08,955 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:61051
2017-11-20 20:44:08,955 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:61051
2017-11-20 20:44:08,955 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad30090 with negotiated timeout 30000 for client /127.0.0.1:61051
2017-11-20 20:44:08,957 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30090 type:create cxid:0x3 zxid:0x7dd txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
2017-11-20 20:44:08,958 [server.PrepRequestProcessor] INFO : Processed session termination for sessionid: 0x15fdc3d5ad30090
2017-11-20 20:44:08,958 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:61051 which had sessionid 0x15fdc3d5ad30090
2017-11-20 20:44:10,083 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:61055
2017-11-20 20:44:10,083 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:61055
2017-11-20 20:44:10,088 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad30091 with negotiated timeout 30000 for client /127.0.0.1:61055
2017-11-20 20:44:10,090 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30091 type:create cxid:0x3 zxid:0x7e0 txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
2017-11-20 20:44:10,091 [server.PrepRequestProcessor] INFO : Processed session termination for sessionid: 0x15fdc3d5ad30091
2017-11-20 20:44:10,091 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:61055 which had sessionid 0x15fdc3d5ad30091
2017-11-20 20:44:10,094 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:61056
2017-11-20 20:44:10,094 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:61056
2017-11-20 20:44:10,095 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad30092 with negotiated timeout 30000 for client /127.0.0.1:61056
2017-11-20 20:44:10,097 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30092 type:create cxid:0x3 zxid:0x7e3 txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
2017-11-20 20:44:10,097 [server.PrepRequestProcessor] INFO : Processed session termination for sessionid: 0x15fdc3d5ad30092
2017-11-20 20:44:10,098 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:61056 which had sessionid 0x15fdc3d5ad30092
2017-11-20 20:44:11,215 [server.NIOServerCnxnFactory] INFO : Accepted socket connection from /127.0.0.1:61060
2017-11-20 20:44:11,215 [server.ZooKeeperServer] INFO : Client attempting to establish new session at /127.0.0.1:61060
2017-11-20 20:44:11,220 [server.ZooKeeperServer] INFO : Established session 0x15fdc3d5ad30093 with negotiated timeout 30000 for client /127.0.0.1:61060
2017-11-20 20:44:11,222 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30093 type:create cxid:0x3 zxid:0x7e6 txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
2017-11-20 20:44:11,223 [server.PrepRequestProcessor] INFO : Processed session termination for sessionid: 0x15fdc3d5ad30093
2017-11-20 20:44:11,223 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:61060 which had sessionid 0x15fdc3d5ad30093
2017-11-20 20:44:11,235 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x203d zxid:0x7eb txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_2e0d317f89e23af0 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_2e0d317f89e23af0
2017-11-20 20:44:11,238 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x2049 zxid:0x7ed txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/+default Error:KeeperErrorCode = NoNode for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/+default
2017-11-20 20:44:11,240 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x204f zxid:0x7f0 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/tables Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/tables
2017-11-20 20:44:11,244 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x2061 zxid:0x7f4 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3
2017-11-20 20:44:11,245 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x2065 zxid:0x7f6 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3
2017-11-20 20:44:11,247 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x2069 zxid:0x7f8 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3
2017-11-20 20:44:11,248 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x206d zxid:0x7fa txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3
2017-11-20 20:44:11,249 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x2071 zxid:0x7fc txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/users/root/Tables/3
2017-11-20 20:44:11,253 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x207f zxid:0x7ff txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/3 Error:KeeperErrorCode = NoNode for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/3
2017-11-20 20:44:11,283 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x20df zxid:0x80f txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/tables/3/state Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/tables/3/state
2017-11-20 20:44:11,288 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x20f4 zxid:0x816 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_2e0d317f89e23af0 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_2e0d317f89e23af0
2017-11-20 20:44:13,006 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x21b5 zxid:0x827 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_66c3fbf983b05712 Error:KeeperErrorCode = NodeExists for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/fate/tx_66c3fbf983b05712
2017-11-20 20:44:13,010 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x21c1 zxid:0x829 txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/+accumulo Error:KeeperErrorCode = NoNode for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/+accumulo
2017-11-20 20:44:13,013 [server.PrepRequestProcessor] INFO : Got user-level KeeperException when processing sessionid:0x15fdc3d5ad30007 type:create cxid:0x21c8 zxid:0x82c txntype:-1 reqpath:n/a Error Path:/accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/!0 Error:KeeperErrorCode = NoNode for /accumulo/23a9d688-7362-4fa0-8d6f-fd1c8f2c548d/table_locks/!0

I can email them to you if you want to see the rest. Not sure if anything of these logs could be sensitive information.

Edit more goodies

2017-11-20 20:44:23,319 [server.NIOServerCnxn] WARN : caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x15fdc3d5ad30006, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:745)
2017-11-20 20:44:23,319 [server.NIOServerCnxn] INFO : Closed socket connection for client /127.0.0.1:60480 which had sessionid 0x15fdc3d5ad30006

@keith-turner
Copy link
Contributor

@cjmctague can you find the test org.apache.fluo.integration.impl.ZKSecretIT and delete it? I am wondering if that test is mucking things up for you.

@keith-turner
Copy link
Contributor

I noticed in the logs you emailed me that ZKSecretIT runs first and then everything after that fails. That test does modify auths in Zookeeper.

@cjmctague
Copy link
Contributor Author

After removing the ZKSecretIT the "NoAuth" errors have gone away. Is there a way to for this IT to go last or should we just fix the "Auth" after the IT has concluded?

Now the only thing preventing me from getting a successful mvn clean verify is the OracleIT

2017-11-27 10:37:38,232 [server.AbstractNonblockingServer$FrameBuffer] ERROR: Unexpected throwable while invoking!
java.lang.IllegalStateException: Received timestamp request but Oracle is not leader
	at org.apache.fluo.core.oracle.OracleServer.getTimestampsImpl(OracleServer.java:238)
	at org.apache.fluo.core.oracle.OracleServer.getTimestamps(OracleServer.java:218)
	at org.apache.fluo.core.thrift.OracleService$Processor$getTimestamps.getResult(OracleService.java:271)
	at org.apache.fluo.core.thrift.OracleService$Processor$getTimestamps.getResult(OracleService.java:254)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
	at org.apache.thrift.server.Invocation.run(Invocation.java:18)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-11-27 10:39:44,187 [server.AbstractNonblockingServer$FrameBuffer] ERROR: Read a frame size of 1633837924, which is bigger than the maximum allowable buffer size for ALL connections.
java.lang.IllegalStateException: OracleClient is closed
	at org.apache.fluo.core.oracle.OracleClient.checkClosed(OracleClient.java:413)
	at org.apache.fluo.core.oracle.OracleClient.getStamp(OracleClient.java:371)
	at org.apache.fluo.integration.impl.OracleIT$TimestampFetcher.run(OracleIT.java:89)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 272.035 sec <<< FAILURE! - in org.apache.fluo.integration.impl.OracleIT
failover_newTimestampRequested(org.apache.fluo.integration.impl.OracleIT)  Time elapsed: 120.005 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 120 seconds
	at org.apache.fluo.integration.impl.OracleIT.failover_newTimestampRequested(OracleIT.java:199)

threadFailoverTest(org.apache.fluo.integration.impl.OracleIT)  Time elapsed: 119.998 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 120 seconds
	at org.apache.fluo.integration.impl.OracleIT.threadFailoverTest(OracleIT.java:298)


@keith-turner
Copy link
Contributor

After removing the ZKSecretIT the "NoAuth" errors have gone away. Is there a way to for this IT to go last or should we just fix the "Auth" after the IT has concluded?

I looked at a successful Travis build and this test ran in the middle of the ITs. There are two possibilities. When this test runs first, it leaves everything in a bad state OR there is something in your environment that exposing a bug. I suspect its the latter, but not sure. I will take a look at the test and see if better cleanup could be done.

Re the OracleIT, some of the error messages from it are expected. The timeout is not. Does that test consistently timeout? Could try bumping its timeout up by adding 2 * like the following in the test.

public Timeout globalTimeout = Timeout.seconds(2 * getTestTimeout());

@cjmctague
Copy link
Contributor Author

My environment currently is on a mid 2012 macOS laptop with the latest version that the issues are occurring on. I runbrew update and brew upgrade probably once a week so everything is up to date. Any particular information you would like to know?

The timeout did not change anything other than it taking longer to time out.

I just started an environment on my Centos7 server at school. Was able to run mvn verify without issue.

@keith-turner
Copy link
Contributor

@cjmctague can you try removing the timeout rule and see if it ever completes... or set it really high.

Also when its stuck, can you find the java process running the IT and run jstack on it?

@cjmctague
Copy link
Contributor Author

@keith-turner Removing the timeout rule just made the test never end.

I just emailed you the jStack dump

@keith-turner
Copy link
Contributor

I looked at the jstack you sent, it seems like the file contains 4 jstacks. Are those for 4 different test runs?

It seems in two the of the jstacks the main test thread is stuck waiting to get a timestamp from the oracle. In one of the jstacks, the main thread is stuck trying to connect to zookeeper. In the other jstack I am not sure what it is doing.

@mikewalch
Copy link
Member

I am seeing the same problem with OracleIT on a Mac. I will look into this.

@cjmctague
Copy link
Contributor Author

cjmctague commented Nov 30, 2017

@keith-turner Yea sorry those were four different test runs.

@kennethmcfarland
Copy link
Contributor

It looks like some of this is related to #1000. What is the status on this issue if I may politely ask?

@ctubbsii
Copy link
Member

I don't have a Mac to reproduce. @mikewalch , were you ever able to reproduce this?

@kennethmcfarland
Copy link
Contributor

kennethmcfarland commented Jan 25, 2018 via email

@ctubbsii
Copy link
Member

@kpm1985 I think the issue needs to be reproduced. It may or may not affect only Macs. From the discussion above, we know it affects @cjmctague 's Mac builds, at the very least. If it can be reproduced, we'd need to track down the cause of the problem. If you can help with reproducing and figuring out why it's happening, please feel free to do so.

@kennethmcfarland
Copy link
Contributor

kennethmcfarland commented Jan 25, 2018 via email

@cjmctague
Copy link
Contributor Author

ZKSecretIT.java#L59

@Override
protected void setConfig(FluoConfiguration config) {
	config.setZookeeperSecret("are3");
	config.setObserverProvider(MyObserverProvider.class);
}

Setting the secret here is messing up the config for the rest of the IT's in my environment. The secret should return to the default after this test is done. Where is the default for this property being set?

@keith-turner
Copy link
Contributor

I think this test needs to completely clear zookeeper after it runs. I can look into doing that.

@ctubbsii
Copy link
Member

Is it using its own ZK, or the one from the accumulo-maven-plugin? If the latter, best not to completely clear it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants