Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byte_conversion, bytscl and formats tests fail on Apple Silicon CI #1876

Closed
slayoo opened this issue Aug 17, 2024 · 13 comments
Closed

byte_conversion, bytscl and formats tests fail on Apple Silicon CI #1876

slayoo opened this issue Aug 17, 2024 · 13 comments

Comments

@slayoo
Copy link
Member

slayoo commented Aug 17, 2024

@gnudatalanguage/gdldevs, anyone having access to an Apple Silicon machine, help welcome! Thanks

Example output (https://github.com/gnudatalanguage/gdl/actions/runs/10431443505/job/28891119074)

66/214 Test  #66: test_byte_conversion.pro ...........***Failed    0.65 sec
...
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32765
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32766
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32767
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32769
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32770
...
 67/214 Test  #67: test_bytscl.pro ....................***Failed    0.69 sec
...
% TEST_BYTSCL_TOP: Error on operation : Pb with top=     -128, min/max :        0,        0
...
102/214 Test #102: test_formats.pro ...................***Failed    1.13 sec
...
% When using the RAN1 mode, be sure to keep the RAN1 and dSFMT seed arrays in separate variables.
multiple reference file <<formats.GDL>> found ! First used !!
/Users/runner/work/gdl/build/testsuite/formats.GDL
/Users/runner/work/gdl/gdl/testsuite/formats.GDL
Files to be compared : formats.IDL, formats.GDL
...
% TEST_FORMATS: =  1595 errors encountered during TEST_FORMATS tests  =
...
The following tests FAILED:
	 66 - test_byte_conversion.pro (Failed)
	 67 - test_bytscl.pro (Failed)
	102 - test_formats.pro (Failed)
@alaingdl
Copy link
Contributor

OK, but too much troubles on OSX now, I need some quiet time to investigate :(
(short answer : when compilating without the script, only test_formats is broken
--but I experienced glitches with the tests, I don't know why--
when compilating with the script, maybe due to wxwidget, everything is broken)

@GillesDuvert
Copy link
Contributor

see #1881

@slayoo
Copy link
Member Author

slayoo commented Nov 5, 2024

as of now, we have the following failures on CI:

  • macos-14 (i.e., Apple Silicon): test_hdf5 & test_formats
  • macos-12 (i.e., Intel): test_hdf5

The test_hdf failure is:

112/214 Test #112: test_hdf5.pro ......................***Failed    0.47 sec
% Compiled module: TEST_HDF5.
% Compiled module: FILE_SEARCH_FOR_TESTSUITE.
% Compiled module: BANNER_FOR_TESTSUITE.
% Compiled module: GDL_IDL_FL.
% TEST_HDF5_BYTE_ARRAY: 
  NO errors encountered during TEST_HDF5_BYTE_ARRAY tests  
% TEST_HDF5_STRING:   NO errors encountered during TEST_HDF5_STRING tests  
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR:   3 errors encountered during TEST_HDF5_ATTR tests  
% TEST_HDF5_DATA:   NO errors encountered during TEST_HDF5_DATA tests  
% H5G_CREATE: Bad value
% TEST_HDF5_OBJ_INFO:   NO errors encountered during TEST_HDF5_OBJ_INFO tests  
% TEST_HDF5_COMP:   NO errors encountered during TEST_HDF5_COMP tests  
% H5G_CREATE: Bad value
% TEST_HDF5: =======================================================
% TEST_HDF5: =                                                     =
% TEST_HDF5: =  3.00000 errors encountered during TEST_HDF5 tests  =
% TEST_HDF5: =                                                     =
% TEST_HDF5: =======================================================

@pjb7687
Copy link
Member

pjb7687 commented Nov 6, 2024

I also noticed this we need to fix it and release v1.1.1 asap

@slayoo
Copy link
Member Author

slayoo commented Nov 6, 2024

Duplicate of #1883 and could be related with: #1899?

@GillesDuvert
Copy link
Contributor

@slayoo #1904 shows NO ERRORS on macos-12 (when did you see an error ?)

So on osx 14 = ARM64 = Apple Silicon = M1/M2/M3 the culprit could be that the HDF5 library has the same problem as GDL, i.e., #1881

I have no currently a M1 machine at hand to handle #1881 but it would serve nothing if the HDF5 library is rotten there...

@slayoo
Copy link
Member Author

slayoo commented Nov 6, 2024

@slayoo #1904 shows NO ERRORS on macos-12 (when did you see an error ?)

Yesterday's macos-12 build for the master branch has test_hdf5 failed:
https://github.com/gnudatalanguage/gdl/actions/runs/11681438801/job/32526479047

% TEST_HDF5_BYTE_ARRAY: 
  NO errors encountered during TEST_HDF5_BYTE_ARRAY tests  
% TEST_HDF5_STRING:   NO errors encountered during TEST_HDF5_STRING tests  
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR:   3 errors encountered during TEST_HDF5_ATTR tests  
% TEST_HDF5_DATA:   NO errors encountered during TEST_HDF5_DATA tests  
% H5G_CREATE: Bad value

@jkohnert
Copy link
Contributor

I get the test_hdf5.pro error , too on a local build in Arch. It turns out to be an error in wait() within spawn. I made a small change in basic_pro.cpp:

diff --git a/src/basic_pro.cpp b/src/basic_pro.cpp
index 87c0e005..33a5c28f 100644
--- a/src/basic_pro.cpp
+++ b/src/basic_pro.cpp
@@ -2142,7 +2142,9 @@ static DWORD launch_cmd(BOOL hide, BOOL nowait,
 
         // wait until child terminates
         int status;
-        pid_t wpid = wait(&status);
+        if (wait(&status) == -1) {
+          Warning(DString("SPAWN: Error waiting for child process: ") + strerror(errno));
+        }
 
         if (exit_statusKeyword)
           e->SetKW(exit_statusIx, new DLongGDL(status >> 8));

to track down whats happening. At least in my local build, wait() returns -1 and sets errno to ECHILD, this is what this small change reveals.

Since test_hdf5.pro uses spawn, and status does not get set to the return value of the spawned command due to the error in wait(), the test fails on my maschine.

However, I could not yet figure why I get ECHILD in the first place.

Maybe someone else has a clue, I'll try to dig into this a bit more, too.

Best Jan

BTW: In #1913 build on MacOS 14 fails with this error, too: https://github.com/gnudatalanguage/gdl/actions/runs/11942416687/job/33289421103?pr=1913

@alaingdl
Copy link
Contributor

As I mentioned in #1910 we do have a trouble in SPAWN for wait()

I am not sure it is the best way to just make a small patch without trying to do a good management of wait() and SPAWN ...
But I am too bad in C to understand https://man7.org/linux/man-pages/man2/waitpid.2.html

I also check that this is the only occurrence of wait()

@jkohnert
Copy link
Contributor

I tried waitpid(pid, &status, 0) while debugging, this doesn't change anything... There must be something else broken. 😭

@jkohnert
Copy link
Contributor

@alaingdl Got the reason: The culprit is setting SIGCHLD to SIG_IGN (gdl.cpp, line 519). This was introduced in #1870 (at least according to the commit message). However, in doing so, a wait() call will result in waiting for all children to finish and then return -1 (as documented in https://man7.org/linux/man-pages/man2/wait.2.html). Commenting this line restores correct behaviour for wait() at least on my local maschine.

We might have another problem for the other failing tests on Apple, though.

Best, Jan

@GillesDuvert
Copy link
Contributor

@jkohnert bravo for the find! When I made #1870 I was not aware SIG_IGN would influence SPAWN (silly, in retrospect, IDL_IDLBridge being a more refined method of gdl respawning itself). Not clear if it will be safe to locally change SIGCHLD handling during the SPAWN command, though. TBC.

@GillesDuvert
Copy link
Contributor

closing as we have 2 different issues there. One solved (spwan). The other not easily solvable (arm64 different math). Opening another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants