-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: schema cache retrying without backoff #3654
Conversation
Fixes PostgREST#3523. Now if there's a failure when obtaining the pg version OR schema cache, we do the same retrying process. This way we don't add two retries. Refactors and renames the "connectionWorker" to "schemaCacheLoader". This makes more sense since what we really want is the schema cache, the version is the pre-requisite for ensuring our schema cache queries work. Additionally, we no longer log ` Attempting to connect to the database...` at startup unnecessarily. This is only logged whenever there's a retry attempt.
@@ -853,7 +853,7 @@ def test_metrics_include_schema_cache_fails(defaultenv, metapostgrest): | |||
r'pgrst_schema_cache_loads_total{status="FAIL"} (\d+)', response.text | |||
).group(1) | |||
) | |||
assert metrics > 3.0 | |||
assert metrics == 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test proves that this works. I didn't want to rely on the test mentioned in #3523 (comment) because it relies on a new feat and I want to backport this fix on v12.
Manually it can be confirmed that the above is fixed though.
$ PGRST_DB_SCHEMAS="x" postgrest-with-postgresql-15 -f test/spec/fixtures/load.sql postgrest-run
10/Jul/2024:20:56:36 -0500: Starting PostgREST 12.3 (pre-release)...
10/Jul/2024:20:56:36 -0500: Listening on port 3000
10/Jul/2024:20:56:36 -0500: Listening for notifications on the "pgrst" channel
10/Jul/2024:20:56:36 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:36 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}
10/Jul/2024:20:56:37 -0500: Attempting to reconnect to the database in 1 seconds...
10/Jul/2024:20:56:37 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:37 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}
10/Jul/2024:20:56:39 -0500: Attempting to reconnect to the database in 2 seconds...
10/Jul/2024:20:56:39 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:39 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}
Some failures on master CI.. that's strange: |
isConnEstablished :: AppState -> IO Bool | ||
isConnEstablished appState = do | ||
AppConfig{..} <- getConfig appState | ||
if configDbChannelEnabled then | ||
if configDbChannelEnabled then -- if the listener is enabled, we can be sure the connection is up | ||
readIORef $ stateIsListenerOn appState | ||
else | ||
pure True | ||
else -- otherwise the only way to check the connection is to make a query | ||
isRight <$> usePool appState (SQL.sql "SELECT 1") | ||
|
||
putIsListenerOn :: AppState -> Bool -> IO () | ||
putIsListenerOn = atomicWriteIORef . stateIsListenerOn | ||
|
||
isConnEstablished :: AppState -> IO Bool | ||
isConnEstablished x = do | ||
conf <- getConfig x | ||
if configDbChannelEnabled conf | ||
then do -- if the listener is enabled, we can be sure the connection status is always up to date | ||
st <- readIORef $ stateConnStatus x | ||
return $ st == ConnEstablished | ||
else -- otherwise the only way to check the connection is to make a query | ||
isRight <$> usePool x (SQL.sql "SELECT 1") | ||
|
||
isLoaded :: AppState -> IO Bool | ||
isLoaded x = do | ||
scacheStatus <- readIORef $ stateSCacheStatus x | ||
connEstablished <- isConnEstablished x | ||
listenerOn <- getIsListenerOn x | ||
return $ scacheStatus == SCLoaded && connEstablished && listenerOn | ||
return $ scacheStatus == SCLoaded && connEstablished | ||
|
||
isPending :: AppState -> IO Bool | ||
isPending x = do | ||
scacheStatus <- readIORef $ stateSCacheStatus x | ||
connStatus <- readIORef $ stateConnStatus x | ||
listenerOn <- getIsListenerOn x | ||
return $ scacheStatus == SCPending || connStatus == ConnPending || not listenerOn | ||
connEstablished <- isConnEstablished x | ||
return $ scacheStatus == SCPending || not connEstablished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be this change of Admin conditions for the /ready
state.
The logic should be the same.. but perhaps the previous one was faster. I cannot reproduce the CI failures locally for now.
Fixes #3523, #2818.
Now if there's a failure when obtaining the pg version OR schema cache, we do the same retrying process. This way we don't add two retries.
Refactors and renames the "connectionWorker" to "schemaCacheLoader". This makes more sense since what we really want is the schema cache, the version is the pre-requisite for ensuring our schema cache queries work.
Additionally, we no longer log
Attempting to connect to the database...
at startup unnecessarily. This is only logged whenever there's a retry attempt.