Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation tests failing #1254

Closed
qstokkink opened this issue Jan 5, 2024 · 10 comments
Closed

Mutation tests failing #1254

qstokkink opened this issue Jan 5, 2024 · 10 comments
Assignees
Labels
priority: high Bugs, broken functionality or critical features

Comments

@qstokkink
Copy link
Collaborator

It appears our mutation testing machine, zulu-ipv8-mutation-tester (ipv8-mutation-tester IPv8), is now missing a dependency:

00:11:18 + run_all_mutation_tests.py ./py-ipv8 .
00:11:18 /tmp/jenkins8324097251270560127.sh: 3: run_all_mutation_tests.py: not found

@qstokkink qstokkink added the priority: high Bugs, broken functionality or critical features label Jan 5, 2024
@qstokkink qstokkink self-assigned this Jan 5, 2024
@qstokkink
Copy link
Collaborator Author

qstokkink commented Jan 5, 2024

"The operation was a success, but the patient died":

12:01:23 java.nio.channels.ClosedChannelException
12:01:23 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
12:01:23 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:143)
12:01:23 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
12:01:23 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
12:01:23 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
12:01:23 	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
12:01:23 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
12:01:23 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
12:01:23 	at java.base/java.lang.Thread.run(Thread.java:840)
12:01:23 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from <Server Name>/<Server IP>:<Server Port>' is disconnected.
12:01:23 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
12:01:23 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
12:01:23 	at jdk.proxy2/jdk.proxy2.$Proxy123.isAlive(Unknown Source)
12:01:23 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1212)
12:01:23 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1204)
12:01:23 	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
12:01:23 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
12:01:23 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
12:01:23 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
12:01:23 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
12:01:23 	at hudson.model.Build$BuildExecution.build(Build.java:199)
12:01:23 	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
12:01:23 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
12:01:23 	at hudson.model.Run.execute(Run.java:1895)
12:01:23 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
12:01:23 	at hudson.model.ResourceController.execute(ResourceController.java:101)
12:01:23 	at hudson.model.Executor.run(Executor.java:442)

The build executed correctly for several hours, but the build executor lost connection due to another issue.

@qstokkink
Copy link
Collaborator Author

Agent restarted. Hopefully this was just a one time thing 🤞

@qstokkink
Copy link
Collaborator Author

Not a one time thing. The builder disconnected again. 😢

Perhaps we need to change the priority of the Jenkins agent jar to take priority over everything else.

@qstokkink
Copy link
Collaborator Author

Switched to nohup bash -c 'java -jar agent.jar etc etc' > test.txt 2>&1 </dev/null &. Hopefully it stays online now. We'll see in a few hours.

@qstokkink
Copy link
Collaborator Author

🎉 The builder no longer disconnects. On to the next error:

12:40:54 Done! Minimizing output
12:40:54 Skipping /[...]/index.html, no index.html found!
12:40:54 Traceback (most recent call last):
12:40:54   File "/home/run_all_mutation_tests.py", line 116, in <module>
12:40:54     shutil.copy(os.path.join('/root', 'MutPy', 'mutpy', 'templates', 'include', 'jquery.js'), base_output_dir)
12:40:54   File "/usr/lib/python3.10/shutil.py", line 417, in copy
12:40:54     copyfile(src, dst, follow_symlinks=follow_symlinks)
12:40:54   File "/usr/lib/python3.10/shutil.py", line 254, in copyfile
12:40:54     with open(src, 'rb') as fsrc:
12:40:54 FileNotFoundError: [Errno 2] No such file or directory: '/root/MutPy/mutpy/templates/include/jquery.js'

@qstokkink
Copy link
Collaborator Author

Third error fixed. Second error is back: the builder is disconnecting again.

It did stay online while I had an active connection open to the container. Perhaps there is some sort of hibernation mode that triggers.

@qstokkink
Copy link
Collaborator Author

Based on https://community.jenkins.io/t/how-to-affect-ssh-parameters-on-ssh-agent-like-keep-alive/5954, we should probably try playing with the ~/.ssh/config file. The posted example in the link above is:

Host *
    ServerAliveInterval 60
    ServerAliveCountMax 3

Our disconnecting job takes (just short of) 2 hours. Based only on gut feeling alone, setting the alive interval to 5 minutes and the max missing count to 24 should suffice. I'll try this out once I'm on the (physical) premises again and I have access to the machine.

@qstokkink
Copy link
Collaborator Author

To get a sense of perspective on Jenkins, I looked into GitHub Actions. At the time of writing, the maximum job execution time is 6 hours and a cron build trigger exists. This means it would be theoretically feasible to use GitHub Actions for our nightly build.

That said, we would still have to create the action (☹️), create a proper MutPy fork from my disgusting patches in the secret Tribler/py-ipv8-mutation-libraries repository (☹️), and rework the disgusting patches to be even more disgusting and output something compatible with GitHub job summaries, which use Markdown instead of HTML (😭). In short, two things I don't want to do and one thing I REALLY don't want to do.

Practically speaking, it's probably still best to stick with Jenkins.

@qstokkink qstokkink assigned xoriole and unassigned qstokkink Jan 25, 2024
@xoriole
Copy link
Contributor

xoriole commented Jan 29, 2024

I have updated the agent to connect via SSH. Hopefully, it will not disconnect anymore.
Here is a running job: https://jenkins.tribler.org/job/ipv8/job/mutation_test_daily/21/

@qstokkink
Copy link
Collaborator Author

Seems to be fixed now. Thanks @xoriole!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high Bugs, broken functionality or critical features
Development

No branches or pull requests

2 participants