Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add max connection lifetime option #28

Merged
merged 20 commits into from
Apr 3, 2023

Conversation

robx
Copy link
Contributor

@robx robx commented Feb 3, 2023

We're seeing trouble with PostgREST where long-lived connections grow to use large amounts of memory postgresql-server-side: PostgREST/postgrest#2638. (As far as I understand, postgresql caches a variety of things per connection-handler and never releases them.)

It seems to me that putting some arbitrary maximal lifetime on connections is the best way to address this, and is something that's likely to be relevant to other users of hasql-pool.

Somewhat outdated initial description follows, see below for the current state:

This PR is a quick stab at implementing that, but I'm more than happy to rewrite it or to go with some other approach. The current draft definitely has some issues, particularly:

  • The setting interface with two Maybe Int for the timeouts is a pain -- should this address Force the timeout option to be specified #22 and force both to be specified? or maybe introduce some settings datatype with reasonable default values for both?
  • This is a little bit different from the 0.5 timeout setting, which would be something like a maxIdleTime instead of maxLifetime. (For the issue at hand it seems a maxLifetime is more appropriate than a maxIdleTime, since if we keep the connections busy an idle timeout would never trigger.) Would it be better to add both settings while we're at it?

@nikita-volkov
Copy link
Owner

Haven't looked at the code yet. I already love the idea in itself. You have my full support in having this feature implemented and merged.

Would it be better to add both settings while we're at it?

I do also like the idea of adding the maxIdleTime configuration, but I believe it's best to do things iteratively and in isolation. Let's add the maxLifetime first, then, if energy is left, focus on the next one. I've isolated it into its own issue (#29).

force both to be specified?

Great! I still like that idea. Seems like a responsible design move.

introduce some settings datatype with reasonable default values for both?

I support that as well.

@nikita-volkov
Copy link
Owner

I took a look at the code. I have a few comments.

  1. If I understand correctly the connection whose lifetime is past due gets reestablished. I think it would be better to first try to fetch one of the available connections and only if there's none establish the new one.
  2. The invalidation is implemented via a check before an attempt to use. A problem with that is that the resources will be held up until an attempt to use the connection is made, which is unpredictable and may happen too late or even never. Looks like this needs addressing by running a separate thread that will be responsible for the cleaning.

What do you think?

@robx
Copy link
Contributor Author

robx commented Feb 6, 2023

I took a look at the code. I have a few comments.

1. If I understand correctly the connection whose lifetime is past due gets reestablished. I think it would be better to first try to fetch one of the available connections and only if there's none establish the new one.

2. The invalidation is implemented via a check before an attempt to use. A problem with that is that the resources will be held up until an attempt to use the connection is made, which is unpredictable and may happen too late or even never. Looks like this needs addressing by running a separate thread that will be responsible for the cleaning.

What do you think?

Yes, this was pretty much the minimal attempt, trying to avoid adding more complexity to the implementation.
Below is me trying to argue that maybe we could attempt to roll with this, but typing it up I kind of got to the point where I think it might be easier to just give an active management thread a shot.

some rambling on the quoted issues

Adressing only 1. in the straightforward way (loop until you find a "live" connection or the pool is empty) introduces some pathological cases, e.g. imagine that the pool is full of idle connections, and they all time out. Then the next request is going to loop through and close all of the connections before opening a new one. That potential huge overhead seems worse to me than occasional connection-establishment overhead. (The current design of the library implies this occasional connection-establishment overhead anyway, since we're not pre-establishing connections.)

Regarding 2., I agree that some active component makes sense for a robust solution to timing out. It also seems to imply pretty invasive changes to the library however, which I was a bit hesitant to suggest/implement. I was hoping the simple approach might be good enough (for a well-loaded pool it should address the base issue since connections will be reused regularly), but it's true that it doesn't adress all scenarios. E.g. the pool won't ever shrink after bursts.

Could we get away with a partial solution that keeps the library passive by documenting the behaviour?

@robx robx force-pushed the conn-lifetime branch 2 times, most recently from b768736 to 387061e Compare February 9, 2023 12:45
@robx
Copy link
Contributor Author

robx commented Feb 9, 2023

I've reworked this now to include active management. There's a couple of ad hoc decisions here regarding the interface that are not at all intended to be final. Summary of changes at this point:

  • collect configuration in a Config datatype, with an interface modeled on Warp settings
    acquire and acquireDynamically are still there, though maybe the latter could be dropped in favour of acquireConf?
  • acquire* currently don't launch the management thread -- if we wanted to do that we'd need to expose something like destroy :: Pool -> IO () in addition to release, but I'm not sure it's a good idea; instead, there's withManagedPool (which probably should be named withPool)
  • tests were modified primarily to add a test to verify issue 2 raised above, no comprehensive test coverage of the changes beyond that so far

I think the approach should allow moving forward with this, and would open the door to adding other settings such as an idle timeout or pre-creating connections (#10).

@robx robx marked this pull request as ready for review February 16, 2023 13:16
@robx
Copy link
Contributor Author

robx commented Feb 16, 2023

I've polished this up a bit now, leaning into some of those ad hoc decisions I made. Clearly they're all up for debate, specifically the configuration interface and how we approach backward compatibility with repect to acquire vs withPool.

Also all the naming, choice of defaults, allowing disabling the timeouts etc etc.

@robx robx changed the title add max connection lifetime option (proof of concept) add max connection lifetime option Feb 17, 2023
@nikita-volkov
Copy link
Owner

Thanks. Taking a look

@nikita-volkov
Copy link
Owner

nikita-volkov commented Feb 18, 2023

I don't think the transfer to the withPool construct is necessary.

I see the following strategies which can let us achieve the same features without the ovehaul of UX:

  1. Have a management thread per each connection. This will allow us to stop these threads when associated connections get released.
  2. Have a single thread, stop it when the pool is empty and restart when there's at least one connection.
  3. Have a single management thread and have it stopped when all references to the pool are garbage collected. I think a similar approach is applied in the "resource-pool" library.

Possibly there are other options as well.

I hate to see so much work done on your part. I should have initiated the design discussion beforehand. Sorry for that. To avoid repeating the same mistake let's discuss the suggested design before working on the code. What do you think about the suggestions? Do you have other options in mind?

@robx
Copy link
Contributor Author

robx commented Feb 20, 2023

I think I'd tend most towards leaving it up to garbage collection, even if that's a bit implicit for my taste. 1 and 2 (2 to a lesser extent) seem to come with more bookkeeping, and seem a bit at odds with #10.

I don't think it's a big change, I'll try it out to see how it goes.

(What do you think about the Config approach to collecting the increasing number of settings?)

this is verbatim how resource-pool does it, I'm not *that* familiar
with the subtleties of weak refs tbh

the need to distinguish between pool + pool with manager ref is a
bit painful
ref <- newIORef ()
manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool
void . mkWeakIORef ref $ do
-- When the pool goes out of scope, stop the manager.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified this works via the following change. Seems a bit too awkward to test automatically, what do you think?

diff --git a/library/Hasql/Pool.hs b/library/Hasql/Pool.hs
index 1a3d0b9..17d17d7 100644
--- a/library/Hasql/Pool.hs
+++ b/library/Hasql/Pool.hs
@@ -141,9 +141,10 @@ acquireConf config = do
       <*> newTVarIO (confSize config)
       <*> (newTVarIO =<< newTVarIO True)
   ref <- newIORef ()
-  manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool
+  manager <- forkIOWithUnmask $ \unmask -> unmask $ (putStrLn "forking" >> manage rawPool)
   void . mkWeakIORef ref $ do
     -- When the pool goes out of scope, stop the manager.
+    putStrLn "killing manager"
     killThread manager
   return $ Pool rawPool ref
 
diff --git a/test/Main.hs b/test/Main.hs
index 9ddf141..f8e91cb 100644
--- a/test/Main.hs
+++ b/test/Main.hs
@@ -10,6 +10,7 @@ import Hasql.Pool
 import qualified Hasql.Session as Session
 import qualified Hasql.Statement as Statement
 import qualified System.Environment
+import System.Mem (performGC)
 import qualified System.Random as Random
 import qualified System.Random.Stateful as Random
 import Test.Hspec
@@ -120,6 +121,8 @@ main = do
               res3 <- use countPool $ countConnectionsSession appName
               res3 `shouldBe` Right 0
           )
+      performGC
+      threadDelay 5000000 -- 1s
 
 getConnectionSettings :: IO Connection.Settings
 getConnectionSettings =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole weak ref thing is lifted more or less directly from resource-pool. I'm not 100% on the exception handling part here. I think if the manager action throws it'll spam stderr, but also it shouldn't throw because it's just closing connections, but... bit of room for things to go wrong here.

Copy link
Owner

@nikita-volkov nikita-volkov Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this could be tested if we were to extract a general (not hasql-specialized) lib from here. But until we do I guess it's okay :)

@robx
Copy link
Contributor Author

robx commented Mar 21, 2023

@nikita-volkov could you have a look at this one again when you have a chance?

@nikita-volkov
Copy link
Owner

Hey Rob! Sorry for having to be reminded. Been under a bit of load from life-stuff. I'll get back to you in the coming days.

Copy link
Owner

@nikita-volkov nikita-volkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Rob!

I've finally gotten to carefully review the PR. Sorry for the delay.

Looks good generally. I think we can move forward from here. Great test suite! I do have a few things to discuss, but I promise to be responsive until we merge.

  • Let's avoid exposing management interval configuration. It seems like a negligible detail not worth the user's attention and is actually only required by the current implementation. It is possible to reimplement with the management thread wake-ups automatically derived from the timeouts most optimally each time the thread goes to sleep. Avoiding exposing that will let us implement the optimization in the future (if needed) without changing the API.

  • The Config type seems like an attempt to provide a neater approach to configuration of this package, however there are downsides. It is a choice of a particular approach with its downsides and there are others. To avoid bloat and reduce maintenance I would rather keep the package minimal and stay neutral on the matter. It is a higher-level problem and can be implemented as an extension lib, if needed.

If you want I can take over from here and layer the according changes on top of yours and merge.

ref <- newIORef ()
manager <- forkIOWithUnmask $ \unmask -> unmask $ manage rawPool
void . mkWeakIORef ref $ do
-- When the pool goes out of scope, stop the manager.
Copy link
Owner

@nikita-volkov nikita-volkov Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this could be tested if we were to extract a general (not hasql-specialized) lib from here. But until we do I guess it's okay :)

@robx
Copy link
Contributor Author

robx commented Apr 2, 2023

Hey Rob!

I've finally gotten to carefully review the PR. Sorry for the delay.

Looks good generally. I think we can move forward from here. Great test suite! I do have a few things to discuss, but I promise to be responsive until we merge.

* Let's avoid exposing management interval configuration. It seems like a negligible detail not worth the user's attention and is actually only required by the current implementation. It is possible to reimplement with the management thread wake-ups automatically derived from the timeouts most optimally each time the thread goes to sleep. Avoiding exposing that will let us implement the optimization in the future (if needed) without changing the API.

* The Config type seems like an attempt to provide a neater approach to configuration of this package, however there are downsides. It is a choice of a particular approach with its downsides and there are others. To avoid bloat and reduce maintenance I would rather keep the package minimal and stay neutral on the matter. It is a higher-level problem and can be implemented as an extension lib, if needed.

If you want I can take over from here and layer the according changes on top of yours and merge.

Hi Nikita, thanks for the review! I'd be happy with you taking it over, no particular strong feelings on any of the remarks. The config change was mostly because I felt quite awkward about increasing the number of positional plain integer parameters further, no objection if you're happy with that though (or with any other approach :)).

@nikita-volkov nikita-volkov merged commit 0eda834 into nikita-volkov:master Apr 3, 2023
@nikita-volkov
Copy link
Owner

nikita-volkov commented Apr 3, 2023

I've applied the changes and merged. Thanks for your hard work! I'll let the dust settle for a couple of days before releasing. Please do raise issues if you disagree with any of the changes.

@nikita-volkov
Copy link
Owner

Released!

@robx robx deleted the conn-lifetime branch May 3, 2023 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants