Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure in Mix.Tasks.TestTest (test --listen-on-stdin does not exit on compilation failure) #13660

Closed
awilfox opened this issue Jun 13, 2024 · 9 comments

Comments

@awilfox
Copy link

awilfox commented Jun 13, 2024

Elixir and Erlang/OTP versions

Erlang/OTP 24 [erts-12.3.1] [source] [smp:8:8] [ds:8:8:10] [async-threads:1]

Elixir 1.16.3 (compiled with Erlang/OTP 24)

Also seen on:

Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:64:64] [ds:64:64:10] [async-threads:1]

Elixir 1.15.7 (compiled with Erlang/OTP 24)

Operating system

Adélie Linux 1.0-beta5

Current behavior

We at Adélie Linux are rebuilding our entire package tree against GCC 13. While building Elixir, we ran into this test failure:

==> mix (ex_unit)
Excluding tags: [windows: true]

...............................................................................................................................................................................................................................................................................

  1) test --listen-on-stdin does not exit on compilation failure (Mix.Tasks.TestTest)
     /usr/src/packages/user/elixir/src/elixir-1.16.3/lib/mix/test/mix/tasks/test_test.exs:277
     ** (ExUnit.TimeoutError) test timed out after 60000ms. You can change the timeout:

       1. per test by setting "@tag timeout: x" (accepts :infinity)
       2. per test module by setting "@moduletag timeout: x" (accepts :infinity)
       3. globally via "ExUnit.start(timeout: x)" configuration
       4. by running "mix test --timeout x" which sets timeout
       5. or by running "mix test --trace" which sets timeout to infinity
          (useful when using IEx.pry/0)

     where "x" is the timeout given as integer in milliseconds (defaults to 60_000).

     stacktrace:
       test/mix/tasks/test_test.exs:607: Mix.Tasks.TestTest.receive_until_match/3
       test/mix/tasks/test_test.exs:312: anonymous fn/0 in Mix.Tasks.TestTest."test --listen-on-stdin does not exit on compilation failure"/1
       (elixir 1.16.3) lib/file.ex:1650: File.cd!/2
       test/test_helper.exs:146: MixTest.Case.in_fixture/3
       test/mix/tasks/test_test.exs:278: (test)
       (ex_unit 1.16.3) lib/ex_unit/runner.ex:472: ExUnit.Runner.exec_test/2
       (stdlib 3.17.1) timer.erl:166: :timer.tc/1
       (ex_unit 1.16.3) lib/ex_unit/runner.ex:394: anonymous fn/6 in ExUnit.Runner.spawn_test_monitor/4

I never saw this reported for GCC 8 builds. It happens both with 1.15.7 and 1.16.3, and I can reproduce it every time building for 32-bit x86 on an Intel Skylake using an i586 chroot. We also saw it transiently (but not every time) on an Ampere arm64 builder. Downstream tracker is adelie/packages#1182.

Expected behavior

Tests to pass, as they did when Erlang and Elixir were compiled against GCC 8.5.0.

@awilfox
Copy link
Author

awilfox commented Jun 13, 2024

Further details, if relevant:

Hardware information (CPU, memory)
# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz
    BIOS Model name:     Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz To Be Filled By O.E.M. CPU @ 2.7GHz
    BIOS CPU family:     198
    CPU family:          6
    Model:               94
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            3
    CPU(s) scaling MHz:  29%
    CPU max MHz:         3600.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5599.85
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse
                         2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopolog
                         y nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
                         pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3
                         dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpi
                         d ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsa
                         veopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_
                         l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    1 MiB (4 instances)
  L3:                    8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:         
  Gather data sampling:  Vulnerable: No microcode
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Mitigation; IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Mitigation; TSX disabled
# free -m
               total        used        free      shared  buff/cache   available
Mem:           31980         276        8249          42       23454       31207
Swap:              0           0           0
Operating environment information (kernel, libc)
# uname -a
Linux lab-x86_64-lin-2.tpcl.atwilcox.tech 5.15.132-mc6-easy #1 SMP Sun Nov 12 04:16:02 UTC 2023 i686 GNU/Linux
# /lib/ld-musl-i386.so.1
musl libc (i386)
Version 1.2.3
Dynamic Program Loader
Usage: /lib/ld-musl-i386.so.1 [options] [--] pathname [args]
Toolchain information (GCC, binutils, Perl)
# gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i586-foxkit-linux-musl/13.3.0/lto-wrapper
Target: i586-foxkit-linux-musl
Configured with: /usr/src/packages/system/gcc/src/gcc-13.3.0/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --build=i586-foxkit-linux-musl --host=i586-foxkit-linux-musl --target=i586-foxkit-linux-musl --with-pkgversion='Adelie 13.3.0' --with-bugurl=https://bts.adelielinux.org/ --enable-checking=release --disable-fixed-point --disable-libstdcxx-pch --disable-multiarch --disable-multilib --disable-werror --disable-symvers --enable-__cxa_atexit --enable-default-pie --enable-default-ssp --enable-cloog-backend --enable-languages=c,c++,objc,go,fortran,ada --enable-linker-build-id --with-arch=i586 --with-tune=pentium2 --enable-cld --enable-mmx --disable-libsanitizer --enable-shared --enable-threads --enable-tls --with-matchpd-partitions=32 --with-system-zlib --with-linker-hash-style=gnu
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.3.0 (Adelie 13.3.0)
# ld --version
GNU ld (GNU Binutils) 2.41
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
# perl -V
Summary of my perl5 (revision 5 version 34 subversion 3) configuration:
   
  Platform:
    osname=linux
    osvers=5.15.132-mc6-easy
    archname=i686-linux-thread-multi
    uname='linux lab-x86_64-lin-2.tpcl.atwilcox.tech 5.15.132-mc6-easy #1 smp sun nov 12 04:16:02 utc 2023 i686 gnulinux '
    config_args='-des -Dcccdlflags=-fPIC -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/core_perl -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/vendor_perl -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5/site_perl -Dsitearch=/usr/local/lib/perl5/site_perl -Dlocincpth=  -Doptimize=-O2 -fno-omit-frame-pointer -ggdb -march=pentium-mmx -mtune=pentium-m -g -Duselargefiles -Dusethreads -Duseshrplib -Dd_semctl_semun -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dcf_by=Adelie -Ud_csh -Dusenm'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=undef
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2 -fno-omit-frame-pointer -ggdb -march=pentium-mmx -mtune=pentium-m -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong'
    ccversion=''
    gccversion='13.3.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=4
    doublesize=8
    byteorder=1234
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=12
    longdblkind=3
    ivtype='long'
    ivsize=4
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=4
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib
    libs=-lpthread -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/usr/lib/libc.a
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/core_perl/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -fno-omit-frame-pointer -ggdb -march=pentium-mmx -mtune=pentium-m -g -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Jun 10 2024 02:19:00
  @INC:
    /usr/local/lib/perl5/site_perl
    /usr/local/share/perl5/site_perl
    /usr/lib/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/core_perl
    /usr/share/perl5/core_perl

@josevalim
Copy link
Member

Hi @awilfox, we don’t ship with any native code ij Elixir. So if the GCC version matters, then it is most likely an Erlang bug. But the test case is only checking I/O channels, so it may even be something happening at the OS level. In other words, my bet is that we are surfacing a problem downstream somewhere with I/O channels.

@josevalim
Copy link
Member

For now, I pushed a commit to main that will help us understand more the timeout cases. So we know if something has been written to the pipe or not at all.

@awilfox
Copy link
Author

awilfox commented Jun 13, 2024

Thanks for the quick work. Applied tip of main to 1.16.3 and received:

==> mix (ex_unit)
Excluding tags: [windows: true]

....................................................................................................................................................................................................................................................................

  1) test --listen-on-stdin does not exit on compilation failure (Mix.Tasks.TestTest)
     test/mix/tasks/test_test.exs:277
     ** (RuntimeError) nothing received from port after 15s.
     Expected: "undefined variable \"error_not_a_var\""
     Got: "Restarting...\n"

     code: in_fixture("test_stale", fn ->
     stacktrace:
       test/mix/tasks/test_test.exs:619: Mix.Tasks.TestTest.receive_until_match/3
       test/mix/tasks/test_test.exs:312: anonymous fn/0 in Mix.Tasks.TestTest."test --listen-on-stdin does not exit on compilation failure"/1
       (elixir 1.16.3) lib/file.ex:1650: File.cd!/2
       test/test_helper.exs:146: MixTest.Case.in_fixture/3
       test/mix/tasks/test_test.exs:278: (test)

I'm not very familiar with Erlang, unfortunately; is there a guide somewhere on how I should proceed debugging this issue?

@josevalim
Copy link
Member

This is interesting! The receives the I/O signal but never finishes restarting. I would try reproducing the issue first outside of tests. Do this:

  1. Run mix new foo
  2. cd foo
  3. mix test --listen-on-stdin (this will block the terminal)
  4. Now open up test/foo_test.exs and change line 6 to be simply unknown_var
  5. Go back to the blocked terminal and press enter (newline)

Does it work?

@awilfox
Copy link
Author

awilfox commented Jun 13, 2024

Yes:

builder on lab-x86_64-lin-2 ~/packages/user/elixir/src/elixir-1.16.3/test/foo $ PATH="$(pwd)/../../bin:$PATH" mix test --listen-on-stdin
Compiling 1 file (.ex)
Generated foo app
..
Finished in 0.02 seconds (0.00s async, 0.02s sync)
1 doctest, 1 test, 0 failures

Randomized with seed 725321

[ file changed, then Return pressed ]

Restarting...
    warning: variable unknown_var in code block has no effect as it is never returned (remove the variable or assign it to _ to avoid warnings)
    │
  6 │     unknown_var
    │     ~
    │
    └─ test/foo_test.exs:6:5: FooTest."test greets the world"/1

    error: undefined variable "unknown_var"
    │
  6 │     unknown_var
    │     ^^^^^^^^^^^
    │
    └─ test/foo_test.exs:6:5: FooTest."test greets the world"/1


== Compilation error in file test/foo_test.exs ==
** (CompileError) test/foo_test.exs: cannot compile module FooTest (errors have been logged)

@josevalim
Copy link
Member

Thank you. Unfortunately I do not know the root cause, although everything indicates to be an issue elsewhere. Note Erlang/OTP 24 is no longer officially supported, so I'd recommend migrating to a more recent version and see if the issue persists.

Btw, this pull request was recently merged to Erlang/OTP: erlang/otp#8553 - in case you want to try latest GCC too.

@josevalim
Copy link
Member

Hi @awilfox, do you have updates here on more recent Erlang/OTP on the latest GCC? Thank you!

@awilfox
Copy link
Author

awilfox commented Jul 10, 2024

Upgrading to Erlang/OTP 27 and then Elixir 1.17.2 passes tests on all our environments now. Thank you!

@awilfox awilfox closed this as completed Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants