Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: On i686-linux-gnu, the Downloads test set is failing frequently (but not always) with nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100 #46159

Closed
DilumAluthge opened this issue Jul 25, 2022 · 12 comments
Labels
ci Continuous integration

Comments

@DilumAluthge
Copy link
Member

The error seems to be: nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100

Relevant log snippet:

Error in testset Downloads:
Error During Test at /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:355
  Got exception outside of a @test
  TaskFailedException
  Stacktrace:
    [1] macro expansion
      @ ./experimental.jl:102 [inlined]
    [2] macro expansion
      @ ./timing.jl:394 [inlined]
    [3] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:366 [inlined]
    [4] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
    [5] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:356 [inlined]
    [6] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
    [7] top-level scope
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:4
    [8] include
      @ ./Base.jl:424 [inlined]
    [9] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:24 [inlined]
   [10] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
   [11] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:23 [inlined]
   [12] macro expansion
      @ ./timing.jl:475 [inlined]
   [13] runtests(name::String, path::String, isolate::Bool; seed::UInt128)
      @ Main /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:21
   [14] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, UInt128, Tuple{Symbol}, NamedTuple{(:seed,), Tuple{UInt128}}})
      @ Base ./essentials.jl:810
   [15] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
   [16] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
   [17] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
   [18] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed ./task.jl:501
      nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100
      Stacktrace:
        [1] (::Downloads.var"#9#18"{IOBuffer, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool})(easy::Downloads.Curl.Easy)
          @ Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:388
        [2] with_handle(f::Downloads.var"#9#18"{IOBuffer, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool}, handle::Downloads.Curl.Easy)
          @ Downloads.Curl /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Curl/Curl.jl:88
        [3] #8
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:329 [inlined]
        [4] arg_write(f::Downloads.var"#8#17"{Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool}, arg::IOBuffer)
          @ ArgTools /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:134
        [5] #7
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:328 [inlined]
        [6] arg_read
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:76 [inlined]
        [7] request(url::String; input::Nothing, output::IOBuffer, method::Nothing, headers::Vector{Pair{String, String}}, timeout::Float64, progress::Nothing, verbose::Bool, debug::Nothing, throw::Bool, downloader::Downloads.Downloader)
          @ Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:327
        [8] #3
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:233 [inlined]
        [9] arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Downloads.Downloader, String}, arg::IOBuffer)
          @ ArgTools /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:134
       [10] #download#2
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:232 [inlined]
       [11] (::Main.Test42Main_Downloads.var"#2#3"{Base.Pairs{Symbol, Downloads.Downloader, Tuple{Symbol}, NamedTuple{(:downloader,), Tuple{Downloads.Downloader}}}, String})(output::IOBuffer)
          @ Main.Test42Main_Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:13
       [12] sprint
          @ ./strings/io.jl:107 [inlined]
       [13] #download_body#1
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:12 [inlined]
       [14] #download_json#4
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:18 [inlined]
       [15] macro expansion
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:368 [inlined]
       [16] (::Main.Test42Main_Downloads.var"#24#50"{Downloads.Downloader, String, Int32})()
          @ Main.Test42Main_Downloads ./task.jl:501

Example log: https://buildkite.com/julialang/julia-master/builds/14198#01822720-ae76-4ff4-8a35-cec678dfa6a7

@vtjnash
Copy link
Member

vtjnash commented Jul 25, 2022

JuliaCI/julia-buildkite#185 may fix this, so would be good to check

@giordano
Copy link
Contributor

This Win64 job failed with

      nested task error: RequestError: Failed to connect to httpbingo.julialang.org port 443 after 21058 ms: Timed out while requesting https://httpbingo.julialang.org/delay/2?id=40

which looks similar enough?

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jul 27, 2022

Sadly, I am still seeing the original failure (nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=98) on i686-linux-gnu even after I merged JuliaCI/julia-buildkite#185.

Example log: https://buildkite.com/julialang/julia-master/builds/14331#01823cdc-7767-4c4b-bcc8-976db68868de

In the above log, it says that JULIA_NUM_THREADS was set to 1, and that the value of nthreads() was 1, so it does look like threads were disabled in that job.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jul 27, 2022

This Win64 job failed with

      nested task error: RequestError: Failed to connect to httpbingo.julialang.org port 443 after 21058 ms: Timed out while requesting https://httpbingo.julialang.org/delay/2?id=40

which looks similar enough?

Hmmm. I don't think that's the same failure.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jul 27, 2022

I did see the exact same failure as the original post (nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=1) just now on Buildbot on i686-w64-mingw32.

Log: https://build.julialang.org/#/builders/42/builds/7253

@DilumAluthge
Copy link
Member Author

I should point out that even though this failure didn't go away once I merged JuliaCI/julia-buildkite#185, it certainly did become less frequent.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Aug 5, 2022

I continue to see this on i686-linux-gnu.

@vtjnash @StefanKarpinski Could this be a libuv issue? We are using libuv's getaddrinfo, right?

@vtjnash
Copy link
Member

vtjnash commented Sep 28, 2022

Looking closer, the test fails at 20-30 seconds, but the expected maximum time is 180 seconds: JuliaLang/Downloads.jl#204

Thus, we are simply running too many downloads concurrently, and mbedtls is too slow to keep up.

@StefanKarpinski
Copy link
Member

Interestingly, same error as I saw here: JuliaLang/Downloads.jl#196 (comment). Seems like there's some race condition in libcurl that we're hitting in some cases. In the link, putting a try/catch in a cfunction someone causes it to be triggered more easily.

@vtjnash
Copy link
Member

vtjnash commented Sep 28, 2022

Ah, sorry, I was responding to #46159 (comment), which is a different issue (that happens to be in roughly the same place, for related but not identical reasons)

@StefanKarpinski
Copy link
Member

This semaphore may help: JuliaLang/Downloads.jl#205.

@vtjnash
Copy link
Member

vtjnash commented Oct 3, 2022

You can test this on your own machine (both issues here should turn out to have the same fix) by changing the 100 concurrent attempts to 1500 (or even higher) in this test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration
Projects
None yet
Development

No branches or pull requests

4 participants