Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NimblePool.checkout :idle_timeout When setting pool_max_idle_time #291

Open
mathieurousseau opened this issue Sep 26, 2024 · 1 comment · May be fixed by #292
Open

NimblePool.checkout :idle_timeout When setting pool_max_idle_time #291

mathieurousseau opened this issue Sep 26, 2024 · 1 comment · May be fixed by #292

Comments

@mathieurousseau
Copy link

Hello

We added a pool_max_idle_time to pool configuration.
We then started to have the following issue:

Elixir.ErlangError Erlang error: "** (Oban.CrashError) ** (exit) exited in: NimblePool.checkout(#PID<0.26263.0>)\n    ** (EXIT) shutdown: :idle_timeout" 
    lib/finch/http1/pool.ex:96 Finch.HTTP1.Pool.request/6
    lib/finch.ex:493 anonymous fn/4 in Finch.request/3
    /app/deps/telemetry/src/telemetry.erl:324 :telemetry.span/3
    lib/req/finch.ex:239 Req.Finch.run_finch_request/3
    lib/req/finch.ex:71 Req.Finch.run/4
    lib/req/request.ex:1103 Req.Request.run_request/1
    lib/req/request.ex:1047 Req.Request.run/1

connection max idle time is 10_000
pool max idle time is 20_000

If we let it empty (:infinity). We have no issue.

Thanks.
Mathieu

@oliveigah
Copy link
Contributor

Hey @mathieurousseau! I'm suspecting that this may be a issue of how idle termination is done on finch.

It stops the pool when the first connection becomes idle and not all of them. I'm not sure if this is the semantics we need here, and it's not clear how to achieve proper semantics with the current implementation of nimble_pool's idle ping feature. Gona take a look at it.

Is this bug consitently reproducible?

What I think may be happening is something like this:

  • your pool have 2 connections idle for more than 20 seconds, so this pool should be terminated in the next verification cycle
  • before the pool is terminated some oban worker picks up the first connection to make some reequest
  • now the idle verification happens and terminates the pool because the only remaining connection is idle for more than 20 seconds
  • oban worker crashes because the finch pool is terminated with the reason idle timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants