v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

steve-chavez · 2023-01-27T19:45:38Z

Problem

Idle connections can take up memory and not release it, here's an example htop output showing this:

(authenticator is the connection role used by us, this follows the docs)

This gets serious whenever the db-pool size is big, the connections can consume all memory.

More details on why a connection takes up memory in this SO question. Basically it has a cache of the catalog and this gets bigger as more database objects are added.

Now, this wasn't an issue until v10.0.0, because we had a db-pool-timeout which timed out idle connections. But on v10.1.0 this was removed(#2444) due to the upstream library not supporting it.

Solution

We should bring back the timeout or alternatively seems we can use the DISCARD ALL statement whenever a connection is returned back to the pool(not sure of the drawbacks) according to this AWS blog post.

cc @robx

The text was updated successfully, but these errors were encountered:

steve-chavez · 2023-01-27T19:50:46Z

v10.0.0 might also be affected by this because the timeout was increased to 1 hour.

robx · 2023-01-31T13:45:19Z

There's been quite a bit of discussion of this following this comment: #2444 (comment). It would be useful to summarize this on this ticket. E.g., I'm guessing that setting the session idle timeout wouldn't be sufficient because the connections might all stay busy?

(I find the title of the issue a bit misleading because at first read, it seems like the postgrest process is leaking memory, while instead the postgresql server processes are growing caches indefinitely.)

As noted in the linked discussion, periodically releasing the pool could be an option, at least as a work-around (postgrest could do this in-process on a timer)
It may be instructive to investigate how mature postgres client libraries in other languages handle such issues. E.g. Go's pgx has configuration parameters MaxConnLifetime, MaxConnLifetimeJitter, MaxConnIdleTime.
It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

steve-chavez · 2023-01-31T18:08:37Z

(I find the title of the issue a bit misleading because at first read, it seems like the postgrest process is leaking memory, while instead the postgresql server processes are growing caches indefinitely.)

True, I've just corrected the title.

It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

Hmm, wouldn't it be possible to restore the old timeout on hasql-pool?

steve-chavez · 2023-01-31T18:19:48Z

Note: This issue was made worse by #2620, because users schemas kept changing and increasing the cache size while connections were never restarted. So having #2639 will at least placate this issue.

At this point v10.1.1 is not stable at all, will release a v10.1.2 and when this is done I'll release a v10.1.3 which should finally be stable.

robx · 2023-02-03T10:41:10Z

It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

Hmm, wouldn't it be possible to restore the old timeout on hasql-pool?

Sorry, I meant that it seems hard to introduce a timeout without modifying hasql-pool.

("on top of" in the sense of working with the existing hasql-pool. E.g. you could imagine that when our client code receives a connection from the pool, it checks its age and throws an exception if it's too old, which would cause hasql-pool to discard the connection.)

So yeah I agree that the best thing to do would be to fix this in hasql-pool, by reintroducing the timeout.

robx · 2023-02-03T12:02:22Z

I created nikita-volkov/hasql-pool#28 as a proof of concept fix for this. Depending on how urgent this is for postgrest, I could prepare a postgrest PR that pulls in that development version.

steve-chavez · 2023-02-03T17:44:27Z

@robx Nice work!

Depending on how urgent this is for postgrest, I could prepare a postgrest PR that pulls in that development version.

I think we can wait until the maxIdleTime(nikita-volkov/hasql-pool#29) is added.

For the issue at hand it seems a maxLifetime is more appropriate than a maxIdleTime, since if we keep the connections busy an idle timeout would never trigger.

Hm, this is true but in production I didn't saw a similar issue before when we had the db-pool-timeout, so maybe maxIdleTime is enough.

Could maxLifetime reduce throughput when the max is reached? I mean since new requests would have to wait for connections to be made.

steve-chavez · 2023-02-03T17:51:08Z

Note: this blog post explains the memory drawback of prepared statements(search for "Reduce the memory usage of prepared queries"), which are cached per connection in PostgreSQL.

steve-chavez · 2023-02-16T23:30:44Z

Just to make this more visible. HikariCP has a maxLifetime config with a default of 30 mins, the pg hackers are very positive about this default as shown on this discussion.

) - db-pool-acquisition-timeout is no longer optional, defaults to 10s - new option db-pool-max-lifetime limits the maximal lifetime of a postgresql connection, defaults to 30m

- db-pool-acquisition-timeout is no longer optional, defaults to 10s - new option db-pool-max-lifetime limits the maximal lifetime of a postgresql connection, defaults to 30m

steve-chavez added perf bug labels Jan 27, 2023

steve-chavez changed the title ~~Not having a pool timeout causes a memory leak~~ v10.1: Not having a pool timeout causes a memory leak Jan 29, 2023

steve-chavez changed the title ~~v10.1: Not having a pool timeout causes a memory leak~~ v10.1: Not having a pool timeout causes PostgreSQL connections to leak memory Jan 31, 2023

steve-chavez changed the title ~~v10.1: Not having a pool timeout causes PostgreSQL connections to leak memory~~ v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory Jan 31, 2023

steve-chavez mentioned this issue Jan 31, 2023

v10.1.2 PATCH release #2642

Merged

9 tasks

robx mentioned this issue Feb 3, 2023

add max connection lifetime option nikita-volkov/hasql-pool#28

Merged

steve-chavez mentioned this issue Feb 3, 2023

Extremely high memory usage despite few queries supabase/supabase#11677

Closed

robx closed this as completed in 394bd22 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

steve-chavez commented Jan 27, 2023 •

edited

Loading

steve-chavez commented Jan 27, 2023

robx commented Jan 31, 2023

steve-chavez commented Jan 31, 2023

steve-chavez commented Jan 31, 2023 •

edited

Loading

robx commented Feb 3, 2023

robx commented Feb 3, 2023

steve-chavez commented Feb 3, 2023

steve-chavez commented Feb 3, 2023

steve-chavez commented Feb 16, 2023

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

Comments

steve-chavez commented Jan 27, 2023 • edited Loading

Problem

Solution

steve-chavez commented Jan 27, 2023

robx commented Jan 31, 2023

steve-chavez commented Jan 31, 2023

steve-chavez commented Jan 31, 2023 • edited Loading

robx commented Feb 3, 2023

robx commented Feb 3, 2023

steve-chavez commented Feb 3, 2023

steve-chavez commented Feb 3, 2023

steve-chavez commented Feb 16, 2023

steve-chavez commented Jan 27, 2023 •

edited

Loading

steve-chavez commented Jan 31, 2023 •

edited

Loading