Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something wrong with NUT fightwarn #24

Open
jimklimov opened this issue Sep 20, 2023 · 6 comments
Open

Something wrong with NUT fightwarn #24

jimklimov opened this issue Sep 20, 2023 · 6 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@jimklimov
Copy link
Member

jimklimov commented Sep 20, 2023

Not seen in other builds, but with the last "properly" behaving NUT fightwarn build being https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/73/ in July 2023, the subsequent https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/74/ (and https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/ soon after) in September fail due to what seems to be not-resolving the MAKE variable in many build scenarios - on the same build hosts as the master/PR builds use, and this happens not in all parallel branches, e.g.:

First running a quiet parallel build...
/home/abuild/jenkins-nutci-centos-7-amd64/workspace/nut_nut_fightwarn@tmp/durable-e6cf4883/script.sh: line 7: -s: command not found

real	0m0.000s
user	0m0.000s
sys	0m0.000s
First attempt failed (127), retrying to log what did:
time: invalid option -- 'k'
Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
       [--portability] [--format=format] [--output=file] [--version]
       [--help] command [arg...]

or

First running a quiet parallel build...
time: cannot run VERBOSE=0: No such file or directory
Command exited with non-zero status 127
0.00user 0.00system 0:00.00elapsed 77%CPU (0avgtext+0avgdata 964maxresident)k
0inputs+0outputs (0major+35minor)pagefaults 0swaps
First attempt failed (127), retrying to log what did:
time: invalid option -- 'k'
Try 'time --help' for more information.
CC:  => 
CXX:  => 
jenkins-ubuntu2110-amd64
Applied parsed envvars (compiler/tools-related adjustments, e.g. CONFIG_ENVVARS, STD(XX)ARG and (LD)BITSARG) for build scenario described as:
    Building with CLANG-13 STD=gnu11 STD=gnu++11 on x86_64 64-bit linux-ubuntu-impish platform for MATRIX_TAG="gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit" && (ARCH_BITS=64&&ARCH64=x86_64&&COMPILER=CLANG&&CLANGVER=13&&OS_DISTRO=ubuntu-impish&&OS_FAMILY=linux) && (nut-builder) && BITS=64&&CSTDVARIANT=gnu&&CSTDVERSION_c=11&&CSTDVERSION_cxx=11  &&  LANG=C && LC_ALL=C && TZ=UTC && CFLAGS=-Wall && CXXFLAGS=-Wall :: as part of slowBuild filter: Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)

ARCH64='x86_64'
ARCH_BITS='64'
BITS='64'
BRANCH_NAME='fightwarn'
BUILD_DISPLAY_NAME='#75'
BUILD_ID='75'
BUILD_NUMBER='75'
BUILD_TAG='jenkins-nut-nut-fightwarn-75'
BUILD_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/'
CFLAGS='-Wall'
CI='true'
CI_SLOW_BUILD_FILTERNAME='Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)'
CI_WRAP_SH='ssh -o SendEnv='"'"'*'"'"' "jenkins-ubuntu2110-amd64" /bin/sh -xe '
CLANGVER='13'
COMPILER='CLANG'
CSTDVARIANT='gnu'
CSTDVERSION_c='11'
CSTDVERSION_cxx='11'
CXXFLAGS='-Wall'
DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/399/bus'
EXECUTOR_NUMBER='1'
GIT_AUTHOR_DATE='2023-09-19 23:11:05 +00:00'
GIT_COMMITTER_DATE='2023-09-19 23:11:05 +00:00'
HOME='/home/abuild'
HUDSON_HOME='/var/lib/jenkins/home'
HUDSON_URL='https://ci.networkupstools.org/'
IFS=' 	
JENKINS_HOME='/var/lib/jenkins/home'
JENKINS_URL='https://ci.networkupstools.org/'
JOB_BASE_NAME='fightwarn'
JOB_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/display/redirect'
JOB_NAME='nut/nut/fightwarn'
JOB_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/'
LANG='C'
LC_ALL='C'
LOGNAME='abuild'
MATRIX_TAG='gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit'
MOTD_SHOWN='pam'
NODE_LABELS='ARCH64=x86_64 ARCH_BITS=64 CLANGVER=13 COMPILER=CLANG COMPILER=GCC DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:ci-debian DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src GCCVER=11 MAKE=make NUT_BUILD_CAPS=cppcheck NUT_BUILD_CAPS=cppunit NUT_BUILD_CAPS=drivers:DMF=yes NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=nutconf=yes OS_DISTRO=ubuntu-impish OS_FAMILY=linux PYTHON=python2.7 PYTHON=python3.9 SHELL_PROGS=bash SHELL_PROGS=busybox SHELL_PROGS=csh SHELL_PROGS=dash SHELL_PROGS=ksh93 SHELL_PROGS=sh SHELL_PROGS=tcsh SHELL_PROGS=zsh ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh nut-builder nut-builder:DMF nut-builder:alldrv'
NODE_NAME='ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh'
OLDPWD='/home/abuild'
OPTIND='1'
OS_DISTRO='ubuntu-impish'
OS_FAMILY='linux'
PARMAKE_LA_LIMIT='8'
PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
PPID='822749'
PS1='$ '
PS2='> '
PS4='+ '
PWD='/srv/libvirt/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn'
RUN_ARTIFACTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=artifacts'
RUN_CHANGES_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=changes'
RUN_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect'
RUN_TESTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=tests'
SHELL='/bin/bash'
SHLVL='0'
SSH_CLIENT='10.0.3.1 38436 22'
SSH_CONNECTION='10.0.3.1 38436 10.0.3.122 22'
STAGE_NAME='Prep'
TZ='UTC'
USER='abuild'
WORKSPACE='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn'
WORKSPACE_TMP='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn@tmp'
XDG_RUNTIME_DIR='/run/user/399'
XDG_SESSION_CLASS='user'
XDG_SESSION_ID='193406'
XDG_SESSION_TYPE='tty'
_='/bin/sh'
Actual original envvars for build scenario described as:
    Building with CLANG-13 STD=gnu11 STD=gnu++11 on x86_64 64-bit linux-ubuntu-impish platform for MATRIX_TAG="gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit" && (ARCH_BITS=64&&ARCH64=x86_64&&COMPILER=CLANG&&CLANGVER=13&&OS_DISTRO=ubuntu-impish&&OS_FAMILY=linux) && (nut-builder) && BITS=64&&CSTDVARIANT=gnu&&CSTDVERSION_c=11&&CSTDVERSION_cxx=11  &&  LANG=C && LC_ALL=C && TZ=UTC && CFLAGS=-Wall && CXXFLAGS=-Wall :: as part of slowBuild filter: Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)

ARCH64='x86_64'
ARCH_BITS='64'
BITS='64'
BRANCH_NAME='fightwarn'
BUILD_DISPLAY_NAME='#75'
BUILD_ID='75'
BUILD_NUMBER='75'
BUILD_TAG='jenkins-nut-nut-fightwarn-75'
BUILD_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/'
CFLAGS='-Wall'
CI='true'
CI_SLOW_BUILD_FILTERNAME='Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)'
CI_WRAP_SH='ssh -o SendEnv='"'"'*'"'"' "jenkins-ubuntu2110-amd64" /bin/sh -xe '
CLANGVER='13'
COMPILER='CLANG'
CSTDVARIANT='gnu'
CSTDVERSION_c='11'
CSTDVERSION_cxx='11'
CXXFLAGS='-Wall'
DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/399/bus'
EXECUTOR_NUMBER='1'
GIT_AUTHOR_DATE='2023-09-19 23:11:05 +00:00'
GIT_COMMITTER_DATE='2023-09-19 23:11:05 +00:00'
HOME='/home/abuild'
HUDSON_HOME='/var/lib/jenkins/home'
HUDSON_URL='https://ci.networkupstools.org/'
IFS=' 	
JENKINS_HOME='/var/lib/jenkins/home'
JENKINS_URL='https://ci.networkupstools.org/'
JOB_BASE_NAME='fightwarn'
JOB_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/display/redirect'
JOB_NAME='nut/nut/fightwarn'
JOB_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/'
LANG='C'
LC_ALL='C'
LOGNAME='abuild'
MATRIX_TAG='gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit'
MOTD_SHOWN='pam'
NODE_LABELS='ARCH64=x86_64 ARCH_BITS=64 CLANGVER=13 COMPILER=CLANG COMPILER=GCC DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:ci-debian DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src GCCVER=11 MAKE=make NUT_BUILD_CAPS=cppcheck NUT_BUILD_CAPS=cppunit NUT_BUILD_CAPS=drivers:DMF=yes NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=nutconf=yes OS_DISTRO=ubuntu-impish OS_FAMILY=linux PYTHON=python2.7 PYTHON=python3.9 SHELL_PROGS=bash SHELL_PROGS=busybox SHELL_PROGS=csh SHELL_PROGS=dash SHELL_PROGS=ksh93 SHELL_PROGS=sh SHELL_PROGS=tcsh SHELL_PROGS=zsh ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh nut-builder nut-builder:DMF nut-builder:alldrv'
NODE_NAME='ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh'
OLDPWD='/home/abuild'
OPTIND='1'
OS_DISTRO='ubuntu-impish'
OS_FAMILY='linux'
PARMAKE_LA_LIMIT='8'
PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
PPID='822718'
PS1='$ '
PS2='> '
PS4='+ '
PWD='/srv/libvirt/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn'
RUN_ARTIFACTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=artifacts'
RUN_CHANGES_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=changes'
RUN_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect'
RUN_TESTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=tests'
SHELL='/bin/bash'
SHLVL='0'
SSH_CLIENT='10.0.3.1 38434 22'
SSH_CONNECTION='10.0.3.1 38434 10.0.3.122 22'
STAGE_NAME='Prep'
TZ='UTC'
USER='abuild'
WORKSPACE='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn'
WORKSPACE_TMP='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn@tmp'
XDG_RUNTIME_DIR='/run/user/399'
XDG_SESSION_CLASS='user'
XDG_SESSION_ID='193404'
XDG_SESSION_TYPE='tty'
_='/bin/sh'

Notably no MAKE=... is provided here, so a fallback to make should have happened. Maybe this is linked with the recent effort to untangle parallel closure creations (using def and clones everywhere, to avoid changing of same values from different logic branches as was seen earlier with mix-ups of stage name groovy strings vs. contents of envvars passed to them; possibly some move from GStrings to be resolved immediately as Strings was not completed?..)

So far nothing apparently toxic was found in NUT Jenkinsfile-dynamatrix (nor ci_build.sh) changes between these builds.

  • 73:
Revision: 37befb64cf2c1050ee52e953b31f66131b3cdf50
Repository: https://github.com/networkupstools/jenkins-dynamatrix.git

Revision: 91396d05b72e0b97bf8a1f6a71212d3161c6340d
Repository: https://github.com/networkupstools/nut.git
  • 75:
Revision: 0d3add30edf6403f92188a750bb8827a3aaa237d
Repository: https://github.com/networkupstools/jenkins-dynamatrix.git

Revision: cb5e92cccdb30c10d11546a9a4bb92ca28831b9f
Repository: https://github.com/networkupstools/nut.git

Numbers were roughly equal:

  • 74: Not all went well: countStagesStarted:350 countStagesCompleted:350 countStagesFinishedOK:250 countStagesFinishedFailure:100
  • 75: Not all went well: countStagesStarted:350 countStagesCompleted:350 countStagesFinishedOK:251 countStagesFinishedFailure:99
@jimklimov
Copy link
Member Author

Testing a theory that either we did pass MAKE envvar from pipelines to build scripts (specifically ci_build.sh) before and do not do so now, or now we pass an empty value when not specified by a build scenario/matrix case, and ultimately the build logic is confused...

@jimklimov
Copy link
Member Author

At least, the message is directly related to the library:

$ git grep 'First running a quiet parallel build'

vars/autotools.groovy:            dynacfgPipeline.buildPhases['buildQuiet'] = """( echo "First running a quiet parallel build..." >&2; eval time \${MAKE} \${MAKE_OPTS} VERBOSE=0 V=0 -s -k -j 4 all >/dev/null && echo "SUCCESS" && exit 0; echo "First attempt failed (\$?), retrying to log what did:"; eval time \${MAKE} \${MAKE_OPTS} -k all )"""

vars/autotools.groovy:            dynacfgPipeline.buildPhases['buildQuietCautious'] = """( echo "First running a quiet parallel build..." >&2; eval time \${MAKE} \${MAKE_OPTS} VERBOSE=0 V=0 -s -k -j 4 all >/dev/null && echo "Seemingly a SUCCESS" ; echo "First attempt finished (\$?), retrying to log what fails (if any):"; eval time \${MAKE} \${MAKE_OPTS} -k all )"""

No hits in NUT for the log message, and got shell envvar expansions of MAKE here...

jimklimov added a commit to jimklimov/jenkins-dynamatrix that referenced this issue Sep 20, 2023
….MAKE defined even if there are *other* tools [networkupstools#24]

Signed-off-by: Jim Klimov <[email protected]>
@jimklimov
Copy link
Member Author

One idea that belongs here (testing now) is that originally we initialized a default dynacfgPipeline.defaultTools.MAKE if the defaultTools was missing. Maybe it is pre-populated better now, and we should only create the map if missing, and separately the MAKE entry if missing in the map

@jimklimov
Copy link
Member Author

jimklimov commented Sep 20, 2023

Still at it :( Perhaps the map is not always consulted (or dedicated instance/clone passed?) when expanding the buildPhases at run-time?.. Or is deleted at a later time from the pipeline preparation logic?..

jimklimov added a commit to jimklimov/nut that referenced this issue Sep 21, 2023
@jimklimov
Copy link
Member Author

After some attempts to rectify, in essence, the symptoms (e.g. fit a MAKE definition into dynacfg* maps more correctly), I found that other variables were no longer handled well (e.g. CC and CXX which are prepared from CLANGVER and GCCVER etc. by a configureEnvvars scriptlet which apparently no longer got called either => all builds went with default gcc usually), I think I came upon the root cause: summer's refactoring of the library, which among other things added protections against overwriting the original maps that are input into the sanityCheck*() and some other methods.

Groovy allows to manipulate original map contents directly (input variable names are references to those maps), so to isolate what happens in the method while keeping an original intact, a clone is made early on, and that clone is returned from the method for caller to assign wherever they want. Nothing can go wrong, and caller's data objects are safe, right?..

In practice, with the closures using a delegation mechanism (to resolve variables from caller context), we end up setting this.script into the delegation aroungd generateBuild() method, and probably it is the higher-priority carrier of a dynacfgPipeline name (a map prepared by a Jenkinsfile-dynamatrix and later adjusted by dynamatrixPipeline.groovy). Closures defined in that dynacfgPipeline which manage matrix cell builds refer to further data from dynamatrixPipeline.somefields - and apparently end up looking into the original map in the script, which remains barely initialized after sanityCheck*() methods decouple it from the map object being actually manipulated.

In other words, success of the groovy script currently relied on all data ending up in the Jenkinsfile's singleton of the map. Reverting with 496500d the clone() operations seems to have fixed the issue, at least compiler names are getting resolved again as of build https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/106/

The "correct" solution would be however to ensure that the build matrix cells get their separate copies of the dynacfg* into each of their contexts, to avoid surprises like independent manipulation of same information there by different code-paths that assume personal sandboxes.

With current implementation and new knowledge, this seems complicated by a few points:

  • many ultimate methods do accept a dynacfg* map argument; they seem to falter in the further practical handling...
  • the actual map object name e.g. dynacfgPipeline as defined in Jenkinsfile and/or some JSL sources seems to be their internal detail; overlap of the names is a "coincidence" as far as the language and classes are concerned, as well as the string name to pass along.
  • references to the map from inside the closures include the name (to resolve via delegation) which sometimes differs from the actual name of the map passed into the matrix methods (e.g. NUT dynacfgPipeline.slowBuildDefaultBody_ci_build sets up a dynacfgPipeline_ciBuild clone passed into the matrix cell, freshly made from a dynacfgPipeline and its (fixed name) dynacfgPipeline?.configureEnvvars taken into account in the prepared codebase.
  • probably some more parameter passing is needed, either to bolt certain names in the nearest resolution context (e.g. dynacfgPipeline as a named closure parameter), or to pass prepared delegations around...
  • the buildMatrixCellCI.groovy method does accept a dynacfgPipeline map as a named closure parameter... maybe it gets a "wrong" one from its caller ultimately in closure wrapping layers leading up to the Jenkinsfile, or does not pass its own copy into delegations used further in its own callouts.

@jimklimov jimklimov added bug Something isn't working enhancement New feature or request labels Oct 8, 2023
@jimklimov
Copy link
Member Author

In Dynamatrix.generateBuild() early on we prepare (hydrate etc.) the body closure delegation context maps. Probably a dynacfgPipeline could be defined there; ideally (parameter? closure? documented magic word? several aliases?) also named so that it resolves back from inside its prepared closure field values. It may help to also add a named copy to DSBC class for reference (e.g. github notifications that benefit from a stash id) - more so for possibility of matrices made from several sources so there is no single dynacfgOrig to help out).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant