fix: schema cache retrying without backoff #3654

steve-chavez · 2024-07-11T01:56:00Z

Now if there's a failure when obtaining the pg version OR schema cache, we do the same retrying process. This way we don't add two retries.

Refactors and renames the "connectionWorker" to "schemaCacheLoader". This makes more sense since what we really want is the schema cache, the version is the pre-requisite for ensuring our schema cache queries work.

Additionally, we no longer log Attempting to connect to the database... at startup unnecessarily. This is only logged whenever there's a retry attempt.

Fixes PostgREST#3523. Now if there's a failure when obtaining the pg version OR schema cache, we do the same retrying process. This way we don't add two retries. Refactors and renames the "connectionWorker" to "schemaCacheLoader". This makes more sense since what we really want is the schema cache, the version is the pre-requisite for ensuring our schema cache queries work. Additionally, we no longer log ` Attempting to connect to the database...` at startup unnecessarily. This is only logged whenever there's a retry attempt.

steve-chavez · 2024-07-11T01:59:11Z

test/io/test_io.py

@@ -853,7 +853,7 @@ def test_metrics_include_schema_cache_fails(defaultenv, metapostgrest):
 r'pgrst_schema_cache_loads_total{status="FAIL"} (\d+)', response.text
 ).group(1)
 )
- assert metrics > 3.0
+ assert metrics == 1.0


This test proves that this works. I didn't want to rely on the test mentioned in #3523 (comment) because it relies on a new feat and I want to backport this fix on v12.

Manually it can be confirmed that the above is fixed though.

$ PGRST_DB_SCHEMAS="x" postgrest-with-postgresql-15 -f test/spec/fixtures/load.sql postgrest-run 10/Jul/2024:20:56:36 -0500: Starting PostgREST 12.3 (pre-release)... 10/Jul/2024:20:56:36 -0500: Listening on port 3000 10/Jul/2024:20:56:36 -0500: Listening for notifications on the "pgrst" channel 10/Jul/2024:20:56:36 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit 10/Jul/2024:20:56:36 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"} 10/Jul/2024:20:56:37 -0500: Attempting to reconnect to the database in 1 seconds... 10/Jul/2024:20:56:37 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit 10/Jul/2024:20:56:37 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"} 10/Jul/2024:20:56:39 -0500: Attempting to reconnect to the database in 2 seconds... 10/Jul/2024:20:56:39 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit 10/Jul/2024:20:56:39 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}

steve-chavez · 2024-07-11T02:22:04Z

Some failures on master CI.. that's strange:

steve-chavez · 2024-07-11T02:51:40Z

src/PostgREST/AppState.hs

+isConnEstablished :: AppState -> IO Bool
+isConnEstablished appState = do
 AppConfig{..} <- getConfig appState
- if configDbChannelEnabled then
+ if configDbChannelEnabled then -- if the listener is enabled, we can be sure the connection is up
 readIORef $ stateIsListenerOn appState
- else
- pure True
+ else -- otherwise the only way to check the connection is to make a query
+ isRight <$> usePool appState (SQL.sql "SELECT 1")

 putIsListenerOn :: AppState -> Bool -> IO ()
 putIsListenerOn = atomicWriteIORef . stateIsListenerOn

-isConnEstablished :: AppState -> IO Bool
-isConnEstablished x = do
- conf <- getConfig x
- if configDbChannelEnabled conf
- then do -- if the listener is enabled, we can be sure the connection status is always up to date
- st <- readIORef $ stateConnStatus x
- return $ st == ConnEstablished
- else -- otherwise the only way to check the connection is to make a query
- isRight <$> usePool x (SQL.sql "SELECT 1")
-
 isLoaded :: AppState -> IO Bool
 isLoaded x = do
 scacheStatus <- readIORef $ stateSCacheStatus x
 connEstablished <- isConnEstablished x
- listenerOn <- getIsListenerOn x
- return $ scacheStatus == SCLoaded && connEstablished && listenerOn
+ return $ scacheStatus == SCLoaded && connEstablished

 isPending :: AppState -> IO Bool
 isPending x = do
 scacheStatus <- readIORef $ stateSCacheStatus x
- connStatus <- readIORef $ stateConnStatus x
- listenerOn <- getIsListenerOn x
- return $ scacheStatus == SCPending || connStatus == ConnPending || not listenerOn
+ connEstablished <- isConnEstablished x
+ return $ scacheStatus == SCPending || not connEstablished


I think it could be this change of Admin conditions for the /ready state.

The logic should be the same.. but perhaps the previous one was faster. I cannot reproduce the CI failures locally for now.

steve-chavez commented Jul 11, 2024

View reviewed changes

steve-chavez merged commit 6be5906 into PostgREST:main Jul 11, 2024
22 checks passed

steve-chavez commented Jul 11, 2024

View reviewed changes

steve-chavez mentioned this pull request Jul 16, 2024

Revoked USAGE on information_schema causes automatic recovery to flood stderr logs #2818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: schema cache retrying without backoff #3654

fix: schema cache retrying without backoff #3654

steve-chavez commented Jul 11, 2024 •

edited

Loading

steve-chavez Jul 11, 2024

steve-chavez commented Jul 11, 2024

steve-chavez Jul 11, 2024 •

edited

Loading

fix: schema cache retrying without backoff #3654

fix: schema cache retrying without backoff #3654

Conversation

steve-chavez commented Jul 11, 2024 • edited Loading

steve-chavez Jul 11, 2024

Choose a reason for hiding this comment

steve-chavez commented Jul 11, 2024

steve-chavez Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

steve-chavez commented Jul 11, 2024 •

edited

Loading

steve-chavez Jul 11, 2024 •

edited

Loading