add max connection lifetime option #28

robx · 2023-02-03T12:00:14Z

We're seeing trouble with PostgREST where long-lived connections grow to use large amounts of memory postgresql-server-side: PostgREST/postgrest#2638. (As far as I understand, postgresql caches a variety of things per connection-handler and never releases them.)

It seems to me that putting some arbitrary maximal lifetime on connections is the best way to address this, and is something that's likely to be relevant to other users of hasql-pool.

Somewhat outdated initial description follows, see below for the current state:

This PR is a quick stab at implementing that, but I'm more than happy to rewrite it or to go with some other approach. The current draft definitely has some issues, particularly:

The setting interface with two Maybe Int for the timeouts is a pain -- should this address Force the timeout option to be specified #22 and force both to be specified? or maybe introduce some settings datatype with reasonable default values for both?
This is a little bit different from the 0.5 timeout setting, which would be something like a maxIdleTime instead of maxLifetime. (For the issue at hand it seems a maxLifetime is more appropriate than a maxIdleTime, since if we keep the connections busy an idle timeout would never trigger.) Would it be better to add both settings while we're at it?

nikita-volkov · 2023-02-03T17:07:57Z

Haven't looked at the code yet. I already love the idea in itself. You have my full support in having this feature implemented and merged.

Would it be better to add both settings while we're at it?

I do also like the idea of adding the maxIdleTime configuration, but I believe it's best to do things iteratively and in isolation. Let's add the maxLifetime first, then, if energy is left, focus on the next one. I've isolated it into its own issue (#29).

force both to be specified?

Great! I still like that idea. Seems like a responsible design move.

introduce some settings datatype with reasonable default values for both?

I support that as well.

nikita-volkov · 2023-02-04T08:56:06Z

I took a look at the code. I have a few comments.

If I understand correctly the connection whose lifetime is past due gets reestablished. I think it would be better to first try to fetch one of the available connections and only if there's none establish the new one.
The invalidation is implemented via a check before an attempt to use. A problem with that is that the resources will be held up until an attempt to use the connection is made, which is unpredictable and may happen too late or even never. Looks like this needs addressing by running a separate thread that will be responsible for the cleaning.

What do you think?

robx · 2023-02-06T14:50:08Z

I took a look at the code. I have a few comments.

1. If I understand correctly the connection whose lifetime is past due gets reestablished. I think it would be better to first try to fetch one of the available connections and only if there's none establish the new one.

2. The invalidation is implemented via a check before an attempt to use. A problem with that is that the resources will be held up until an attempt to use the connection is made, which is unpredictable and may happen too late or even never. Looks like this needs addressing by running a separate thread that will be responsible for the cleaning.

What do you think?

Yes, this was pretty much the minimal attempt, trying to avoid adding more complexity to the implementation.
Below is me trying to argue that maybe we could attempt to roll with this, but typing it up I kind of got to the point where I think it might be easier to just give an active management thread a shot.

some rambling on the quoted issues

Adressing only 1. in the straightforward way (loop until you find a "live" connection or the pool is empty) introduces some pathological cases, e.g. imagine that the pool is full of idle connections, and they all time out. Then the next request is going to loop through and close all of the connections before opening a new one. That potential huge overhead seems worse to me than occasional connection-establishment overhead. (The current design of the library implies this occasional connection-establishment overhead anyway, since we're not pre-establishing connections.)

Regarding 2., I agree that some active component makes sense for a robust solution to timing out. It also seems to imply pretty invasive changes to the library however, which I was a bit hesitant to suggest/implement. I was hoping the simple approach might be good enough (for a well-loaded pool it should address the base issue since connections will be reused regularly), but it's true that it doesn't adress all scenarios. E.g. the pool won't ever shrink after bursts.

Could we get away with a partial solution that keeps the library passive by documenting the behaviour?

robx · 2023-02-09T13:03:56Z

I've reworked this now to include active management. There's a couple of ad hoc decisions here regarding the interface that are not at all intended to be final. Summary of changes at this point:

collect configuration in a Config datatype, with an interface modeled on Warp settings
acquire and acquireDynamically are still there, though maybe the latter could be dropped in favour of acquireConf?
acquire* currently don't launch the management thread -- if we wanted to do that we'd need to expose something like destroy :: Pool -> IO () in addition to release, but I'm not sure it's a good idea; instead, there's withManagedPool (which probably should be named withPool)
tests were modified primarily to add a test to verify issue 2 raised above, no comprehensive test coverage of the changes beyond that so far

I think the approach should allow moving forward with this, and would open the door to adding other settings such as an idle timeout or pre-creating connections (#10).

…cal postgres by socket)

robx · 2023-02-16T13:17:50Z

I've polished this up a bit now, leaning into some of those ad hoc decisions I made. Clearly they're all up for debate, specifically the configuration interface and how we approach backward compatibility with repect to acquire vs withPool.

Also all the naming, choice of defaults, allowing disabling the timeouts etc etc.

nikita-volkov · 2023-02-17T15:17:03Z

Thanks. Taking a look

nikita-volkov · 2023-02-18T11:27:21Z

I don't think the transfer to the withPool construct is necessary.

I see the following strategies which can let us achieve the same features without the ovehaul of UX:

Have a management thread per each connection. This will allow us to stop these threads when associated connections get released.
Have a single thread, stop it when the pool is empty and restart when there's at least one connection.
Have a single management thread and have it stopped when all references to the pool are garbage collected. I think a similar approach is applied in the "resource-pool" library.

Possibly there are other options as well.

I hate to see so much work done on your part. I should have initiated the design discussion beforehand. Sorry for that. To avoid repeating the same mistake let's discuss the suggested design before working on the code. What do you think about the suggestions? Do you have other options in mind?

robx · 2023-02-20T14:01:53Z

I think I'd tend most towards leaving it up to garbage collection, even if that's a bit implicit for my taste. 1 and 2 (2 to a lesser extent) seem to come with more bookkeeping, and seem a bit at odds with #10.

I don't think it's a big change, I'll try it out to see how it goes.

(What do you think about the Config approach to collecting the increasing number of settings?)

this is verbatim how resource-pool does it, I'm not *that* familiar with the subtleties of weak refs tbh the need to distinguish between pool + pool with manager ref is a bit painful

robx · 2023-02-20T14:30:38Z

library/Hasql/Pool.hs

+  ref <- newIORef ()
+  manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool
+  void . mkWeakIORef ref $ do
+    -- When the pool goes out of scope, stop the manager.


I've verified this works via the following change. Seems a bit too awkward to test automatically, what do you think?

diff --git a/library/Hasql/Pool.hs b/library/Hasql/Pool.hs index 1a3d0b9..17d17d7 100644 --- a/library/Hasql/Pool.hs +++ b/library/Hasql/Pool.hs @@ -141,9 +141,10 @@ acquireConf config = do <*> newTVarIO (confSize config) <*> (newTVarIO =<< newTVarIO True) ref <- newIORef () - manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool + manager <- forkIOWithUnmask $ \unmask -> unmask $ (putStrLn "forking" >> manage rawPool) void . mkWeakIORef ref $ do -- When the pool goes out of scope, stop the manager. + putStrLn "killing manager" killThread manager return $ Pool rawPool ref diff --git a/test/Main.hs b/test/Main.hs index 9ddf141..f8e91cb 100644 --- a/test/Main.hs +++ b/test/Main.hs @@ -10,6 +10,7 @@ import Hasql.Pool import qualified Hasql.Session as Session import qualified Hasql.Statement as Statement import qualified System.Environment +import System.Mem (performGC) import qualified System.Random as Random import qualified System.Random.Stateful as Random import Test.Hspec @@ -120,6 +121,8 @@ main = do res3 <- use countPool $ countConnectionsSession appName res3 `shouldBe` Right 0 ) + performGC + threadDelay 5000000 -- 1s getConnectionSettings :: IO Connection.Settings getConnectionSettings =

The whole weak ref thing is lifted more or less directly from resource-pool. I'm not 100% on the exception handling part here. I think if the manager action throws it'll spam stderr, but also it shouldn't throw because it's just closing connections, but... bit of room for things to go wrong here.

I can see how this could be tested if we were to extract a general (not hasql-specialized) lib from here. But until we do I guess it's okay :)

robx · 2023-03-21T09:15:14Z

@nikita-volkov could you have a look at this one again when you have a chance?

nikita-volkov · 2023-03-21T21:39:09Z

Hey Rob! Sorry for having to be reminded. Been under a bit of load from life-stuff. I'll get back to you in the coming days.

nikita-volkov

Hey Rob!

I've finally gotten to carefully review the PR. Sorry for the delay.

Looks good generally. I think we can move forward from here. Great test suite! I do have a few things to discuss, but I promise to be responsive until we merge.

Let's avoid exposing management interval configuration. It seems like a negligible detail not worth the user's attention and is actually only required by the current implementation. It is possible to reimplement with the management thread wake-ups automatically derived from the timeouts most optimally each time the thread goes to sleep. Avoiding exposing that will let us implement the optimization in the future (if needed) without changing the API.
The Config type seems like an attempt to provide a neater approach to configuration of this package, however there are downsides. It is a choice of a particular approach with its downsides and there are others. To avoid bloat and reduce maintenance I would rather keep the package minimal and stay neutral on the matter. It is a higher-level problem and can be implemented as an extension lib, if needed.

If you want I can take over from here and layer the according changes on top of yours and merge.

nikita-volkov · 2023-04-02T14:06:57Z

library/Hasql/Pool.hs

+  ref <- newIORef ()
+  manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool
+  void . mkWeakIORef ref $ do
+    -- When the pool goes out of scope, stop the manager.


I can see how this could be tested if we were to extract a general (not hasql-specialized) lib from here. But until we do I guess it's okay :)

robx · 2023-04-02T18:13:50Z

Hey Rob!

I've finally gotten to carefully review the PR. Sorry for the delay.

Looks good generally. I think we can move forward from here. Great test suite! I do have a few things to discuss, but I promise to be responsive until we merge.

* Let's avoid exposing management interval configuration. It seems like a negligible detail not worth the user's attention and is actually only required by the current implementation. It is possible to reimplement with the management thread wake-ups automatically derived from the timeouts most optimally each time the thread goes to sleep. Avoiding exposing that will let us implement the optimization in the future (if needed) without changing the API.

* The Config type seems like an attempt to provide a neater approach to configuration of this package, however there are downsides. It is a choice of a particular approach with its downsides and there are others. To avoid bloat and reduce maintenance I would rather keep the package minimal and stay neutral on the matter. It is a higher-level problem and can be implemented as an extension lib, if needed.

If you want I can take over from here and layer the according changes on top of yours and merge.

Hi Nikita, thanks for the review! I'd be happy with you taking it over, no particular strong feelings on any of the remarks. The config change was mostly because I felt quite awkward about increasing the number of positional plain integer parameters further, no objection if you're happy with that though (or with any other approach :)).

nikita-volkov · 2023-04-03T22:27:37Z

I've applied the changes and merged. Thanks for your hard work! I'll let the dust settle for a couple of days before releasing. Please do raise issues if you disagree with any of the changes.

nikita-volkov · 2023-04-04T21:08:23Z

Released!

robx mentioned this pull request Feb 3, 2023

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory PostgREST/postgrest#2638

Closed

nikita-volkov mentioned this pull request Feb 3, 2023

Extend configuration with a timeout of connection being idle #29

Closed

robx force-pushed the conn-lifetime branch 2 times, most recently from b768736 to 387061e Compare February 9, 2023 12:45

robx mentioned this pull request Feb 14, 2023

use hasql-pool with connection lifetime option PostgREST/postgrest#2663

Merged

robx added 14 commits February 16, 2023 14:13

connection lifetime proof of concept

9f6ab1e

release pool at the end of each test (should be a helper instead)

f0ebafc

set a random application_name

1de496c

verify basic functionality of session counting

74d1445

config interface

88750a1

withPool helper for tests

f98123a

failing test case for lack of active timeout

c9ffd97

managed pool, fixes timeout

6967353

more withpool

9ca7016

drop withPool helper

bd6ad17

rename capacity to size

53b4d9d

bit more docs, make clean local

d051c6b

default to empty connection settings (which should mean connect to lo…

82c3fc8

…cal postgres by socket)

"finish" this up, without particular regard for backward compatibility

48a37db

robx force-pushed the conn-lifetime branch from 7179f53 to 48a37db Compare February 16, 2023 13:13

robx marked this pull request as ready for review February 16, 2023 13:16

robx added 3 commits February 16, 2023 14:25

extract shared connection liveness test

6c7a6bd

run ormolu

e6e10c5

simplify

bd4253a

robx changed the title ~~add max connection lifetime option (proof of concept)~~ add max connection lifetime option Feb 17, 2023

make acquisition timeout and max lifetime non-optional

9b18157

instead of withPool, kill the manager thread via a weak ref

a84ede0

this is verbatim how resource-pool does it, I'm not *that* familiar with the subtleties of weak refs tbh the need to distinguish between pool + pool with manager ref is a bit painful

robx commented Feb 20, 2023

View reviewed changes

instead of RawPool, put the ref in Pool and pass the fields to manage

0eda834

steve-chavez mentioned this pull request Mar 23, 2023

Concurrency control in multi-user environments PostgREST/postgrest#1069

Open

nikita-volkov requested changes Apr 2, 2023

View reviewed changes

nikita-volkov merged commit 0eda834 into nikita-volkov:master Apr 3, 2023

robx deleted the conn-lifetime branch May 3, 2023 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add max connection lifetime option #28

add max connection lifetime option #28

robx commented Feb 3, 2023 •

edited

Loading

nikita-volkov commented Feb 3, 2023

nikita-volkov commented Feb 4, 2023

robx commented Feb 6, 2023

robx commented Feb 9, 2023

robx commented Feb 16, 2023

nikita-volkov commented Feb 17, 2023

nikita-volkov commented Feb 18, 2023 •

edited

Loading

robx commented Feb 20, 2023

robx Feb 20, 2023

robx Feb 20, 2023

nikita-volkov Apr 2, 2023 •

edited

Loading

robx commented Mar 21, 2023

nikita-volkov commented Mar 21, 2023

nikita-volkov left a comment

nikita-volkov Apr 2, 2023 •

edited

Loading

robx commented Apr 2, 2023

nikita-volkov commented Apr 3, 2023 •

edited

Loading

nikita-volkov commented Apr 4, 2023

add max connection lifetime option #28

add max connection lifetime option #28

Conversation

robx commented Feb 3, 2023 • edited Loading

nikita-volkov commented Feb 3, 2023

nikita-volkov commented Feb 4, 2023

robx commented Feb 6, 2023

robx commented Feb 9, 2023

robx commented Feb 16, 2023

nikita-volkov commented Feb 17, 2023

nikita-volkov commented Feb 18, 2023 • edited Loading

robx commented Feb 20, 2023

robx Feb 20, 2023

Choose a reason for hiding this comment

robx Feb 20, 2023

Choose a reason for hiding this comment

nikita-volkov Apr 2, 2023 • edited Loading

Choose a reason for hiding this comment

robx commented Mar 21, 2023

nikita-volkov commented Mar 21, 2023

nikita-volkov left a comment

Choose a reason for hiding this comment

nikita-volkov Apr 2, 2023 • edited Loading

Choose a reason for hiding this comment

robx commented Apr 2, 2023

nikita-volkov commented Apr 3, 2023 • edited Loading

nikita-volkov commented Apr 4, 2023

robx commented Feb 3, 2023 •

edited

Loading

nikita-volkov commented Feb 18, 2023 •

edited

Loading

nikita-volkov Apr 2, 2023 •

edited

Loading

nikita-volkov Apr 2, 2023 •

edited

Loading

nikita-volkov commented Apr 3, 2023 •

edited

Loading