Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in tests: Java surefire forking is causing LightGBM to fail with serious C++ issues #122

Open
AlbertoEAF opened this issue May 27, 2022 · 1 comment

Comments

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented May 27, 2022

As of #121 LightGBM tests can still fail after the first run with a serious C++ error from the LightGBM side:

[ERROR] munmap_chunk(): invalid pointer
[ERROR] Aborted (core dumped)

After that the following messages will show up:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Openml LightGBM lib 3.0.1-with_model_locale_fix_for_java_and_streaming SUCCESS [  1.659 s]
[INFO] openml-lightgbm-module 1.0.12-SNAPSHOT ............. SUCCESS [  0.378 s]
[INFO] openml-lightgbm 1.0.12-SNAPSHOT .................... FAILURE [  9.691 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12.312 s
[INFO] Finished at: 2022-05-27T14:48:38+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project openml-lightgbm: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:305)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:265)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
[ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] 	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
[ERROR] 	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
[ERROR] 	at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] 	at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)

How to reproduce

# Go to repo root and run:
mvn clean install -DskipTests
cd openml-lightgbm  # Go to provider directory to run its tests.

Tests will always run with:

mvn clean test

However, they will always fail (at least after the first run) with:

mvn test

Conclusion

surefire settings: How to have mvn test work all the time

By simply changing maven-surefire-plugin's forking settings (see [1]) it's possible to have mvn test work all the time too.

This is the observed behaviour regarding mvn test:

  • default settings fail (forkCount=1, reuseForks=true)
  • disabling forks works (forkCount=0)
  • not reusing forks works (reuseForks=false)

Update: All fork settings can still fail after all. The failure odds change though. The only command that works all the time is still mvn clean test. This means the approaches described below don't fix the issue.

Analysis

Given that disabling forks altogether vs running the tests in a forked process (default) causes the issues, the problem seems caused by Java's test stack and not by the LightGBM C++ side (which always works with mvn clean test anyway).

Note: Upgrading to the latest versions of jacoco 0.8.6 and maven-surefire-plugin to 3.0.0-M6 didn't fix the issue either.

Possible solutions

Now, we can disable forks altogether, instead of forking the process before starting the tests once (default, which is failing); or we can set tests not to reuse forks. The latter option is slower as it forks the process for every test class.

Are there any downsides to doing these approaches? And which of those approaches is best?

References

1. maven-surefire-plugin forking docs

@AlbertoEAF
Copy link
Contributor Author

AlbertoEAF commented Jun 2, 2022

Apparently the issue is somewhere in these tests related to LGBM feature contributions, from the LightGBMBinaryClassificationModelTrainerTest test class:

  • testFeatureContributionsTargetBeginning
  • testFeatureContributionsTargetMiddle
  • testFeatureContributionsTargetEnd

Disabling them lets tests run all the time with mvn test. Can you review if we're doing something wrong in the Java provider @mlobofeedzai ?

Otherwise, I'm afraid it's a bug at the LGBM level related with fast prediction handles, and having several to our booster: microsoft/LightGBM#4980 (comment), which would be much trickier to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant