-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool telemetry: available workers #183
Comments
There are 2 pool implementations in finch one for http1 that is nimble pool and other for http2 made by finch. Regarding your question, IMO it could be implemented in nimble_pool and exposed by finch for http1 pools. AFAIK there is no way to access this internal data from nimble_pool, but I need to check, if it exists there is only a matter of expose it on finch API. For http2 usually the pool is simpler and I think this feature may not be needed. Can you tell us more about what is your use case that requires to read the pool status @aymanosman? What are you trying to accomplish? I feel like you shouldn't need to have access to this internals and maybe we can solve this differently! 😄 |
What I am saying definitely only applies to the HTTP1 pool (as that is the only one we are using). Because Finch implements the NimblePool behaviour, it has access to all of the "nimble pool state". Playing around with the nimble pool that powers finch, I can try and guess a way of calculating the number of "available workers". {:ok, _} = Finch.start_link(name: HTTP, pools: %{"https://httpbin.org" => [size: 5]})
{pool, _} = Finch.PoolManager.lookup_pool(HTTP, {:https, "httpbin.org", 443})
req = Finch.build(:get, "https://httpbin.org")
state0 = :sys.get_state(pool)
for _ <- 0..7 do spawn(fn -> Finch.request(req, HTTP) end) end
Process.sleep(1) # give the spawned processes a chance
state1 = :sys.get_state(pool)
# wait a while
state2 = :sys.get_state(pool)
stats = fn state ->
%{available_workers: state.lazy + :queue.len(state.resources), queued_requests: :queue.len(state.queue)}
end
stats.(state0) # %{available_workers: 5, queued_requests: 0}
stats.(state1) # %{available_workers: 0, queued_requests: 3}
stats.(state2) # %{available_workers: 5, queued_requests: 0} My guess is that |
I just realised I didn't answer your question. We use this information to help us decide whether to scale up certain services. |
This workaround using Personally I've never used Probably the best solution would be add some function on nimble_pool and use that in combination with finch's Anyway I think this issue may be better suited in nimble_pool instead of finch, so we can get input from their maintainers. What you think? |
Yea, the use of I do think that this is probably best dealt with in nimble_pool, as |
Great! I can work on this next week after fix my computer! 😅 If you want to open an issue on nimble pool would be good, so we can get input from theirs maintainers about the feature. After it's done on nimble_pool we might expose the info through some finch API. Makes sense to you @aymanosman? Would be good to also gather some input from @sneako! What's your thoughts on this Nico? |
Thanks for starting this discussion! I tend to rely more on system level metrics like load and latency for making decisions around scaling, but I do think that exposing these pool stats could help users configure their pool sizes and counts so I am interested in exploring this further. I don't see any reason to expose this data for HTTP1, but not HTTP2 despite the fact that the available stats may vary slightly. For HTTP1 pools we have a configurable size which does not apply to HTTP2, while HTTP2 pools (connections really), have a state (:connected, :disconnected, :connected_read_only, etc). I could imagine this being implemented via the existing Telemetry events together with an ETS table for example, but it is probably better to go directly to the source of truth as has been suggested. So we would need to update both NimblePool and Finch.HTTP2.Pool in similar fashions to expose this information. Let's start thinking about API proposals for this, particularly what the data we return would look like. Poolboy just returns a 4 element tuple, but I think a struct could be nicer, what do you think? iex> Finch.pool_status(url, MyFinch)
{:ok, struct_or_list_of_structs} |
I agree with you! Since we moved from the narrower case of available workers (which does not make much sense in HTTP2 AFAIK) to a broader case regarding not only available workers but the pool status as a whole in order to improve pool configuration tuning, I think it makes sense to implement it for both HTTP1 and HTTP2.
That could be another workaround that may work for you @aymanosman while the final version is not ready. 😄
I think we should have a different structs for HTTP1 and HTTP2 pools because as you mentioned the stats may vary between both implementations. It is still not 100% clear to me what kind of information we must expose, maybe I could use some help understanding the use case behind this, as I am understanding now this |
Our application essentially proxies many origins and so we use multiple pools, it currently uses MachineGun/Poolboy. Hope this can give you a bit more context, thanks for helping! |
So we're in a similar boat, we really would like to know the state of the pool. We're focused entirely on HTTP1, no HTTP2 for us currently.
We've actually tried to do this and we're struggling a bit to get accurate numbers. I suspect this may boil down to using the wrong telemetry events to try and increment or decrement the number of "in use" connections in the connection pool. @sneako if you were using existing telemetry events to try and track the free/used state of a pool which events would you use? I see our current implementation treats The main question we want to be able to answer is "right now what percentage of our pool is free vs occupied"? We're trying to tune pool sizes because we're reducing the # of running instances of our proxy, but fewer instances means more pressure on each instance's connection pool. Without a good sense of how our pool usage looks right now it's hard to know how much we should increase pool sizes by as we reduce running instances of our proxy elixir app. If this belongs in an elixir forum post let me know :) But tl;dr — we would really benefit from a way to report the state of the pools to our New Relic instrumentation! We'd love to have dashboards, etc. reflecting current pool state. |
Love the idea of a struct! Easy to grow over time if we decide more things should be in there.
|
Another thought — as a stop-gap, telemetry events for checkin and checkout would make it easier to maintain a running total of connections in use (the ETS table idea). I could try to mock that up if it makes sense. Obviously a snapshot of NimblePool's state would be the absolute ideal, but it looks like we may need a change in NimblePool as well as Finch both to support it? Sorry for three comments in a row 😂 I promise it's my last! |
this would also be really useful for us. let me know if there is anything I can do to help. |
Sorry for the big delay! Gonna take a look on this problem this weekend. I think no change on the nimble_pool side will be nedded, because I think we can generated all metrics using existing callbacks. I propose some struct like this: %Finch.HTTP1.Pool.Metrics{
available_resources: 7, # how many connections are available on the pool
in_use_resources: 3, # how many connections have been checked out and did not checked in again yet
average_checkout_time: 200, # the average of how long it took to checkout a connection
max_checkout_time: 600, # the longest it took to checkout a connection
average_usage_time: 50, # the average of how long it took to return a connection to the pool after a checkout
max_usage_time: 500 # the longets it took to return a connection to the pool after a checkout
} All time metrics will be measured in microseconds. What you think about this metrics? @aselder @mattbaker @ettomatic @aymanosman @sneako |
Thank you for looking in to this! I have mostly been relying on the queue duration metrics to estimate how busy/loaded a pool is, but something like this would be extremely useful. Where do you propose this state would be stored? |
Regarding performance in order to avoid ETS updates I was thinking to use atomic counters to increase values and calculate the measurements only when the pool_status is called. That said, I think the reference to such atomics might be stored on :persistent_term on the pool startup. like this: available_resources_ref = :atomics.new(1, [])
:atomics.put(available_resources_ref, 1, queue_size)
:persistent_term.put({Finch, finch_name, :available_resources},available_resources_ref) and on then on the callbacks we can do something like: # on checkout
available_resources_ref = :persistent_term.get({Finch, finch_name, :available_resources})
:atomics.sub(available_resources_ref, 1, 1)
# on checkin
available_resources_ref = :persistent_term.get({Finch, finch_name, :available_resources})
:atomics.add(available_resources_ref, 1, 1) and on the new pool_status function: available_resources_ref = :persistent_term.get({Finch, finch_name, :available_resources})
# other refs here
%{
available_resources: :atomics.get(available_resources_ref, 1),
in_use_resources: :atomics.get(in_use_resources_ref, 1),
average_checkout: :atomics.get(total_checkout_time_ref, 1) / :atomics.get(total_checkouts_qty_ref, 1)
} The idea is this. But gonna see if it fits in at the code. Also, gonna probally use only one atomics and use predefined indexes for each metric in order to call What you think about this? |
Atomics should be good, but don't we want metrics to be per pool rather than per Finch instance? |
You are right! I think we should use shp tuple instead of finch_name. This solve it right? |
Good idea! Finch name plus shp should be good, in case two different finch instances end up with pools for the same shp |
1 similar comment
Good idea! Finch name plus shp should be good, in case two different finch instances end up with pools for the same shp |
Great! If you don't have any initial problem with the proposed implementation and interface. Gonna tackle it this weekend and hope to have a PR soon. (: |
First draft already available @sneako! #248 Please let me know if there is any major problem that you want to be addressed before I move to a more extensive documentation and HTTP2 implementation. In the meantime gonna work on the pool termination callback on the nimble_pool side. We probally gonna need it here. |
Correct me if I'm wrong, but it seems there is no way of querying finch for a pool's current state, for example, number of available workers.
:poolboy
exposes this information via:poolboy.status/1
. I know the pooling implementation differs significantly between the two libraries, so do you think it is possible to get the same sort of information from finch?The text was updated successfully, but these errors were encountered: