Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: schema cache retrying without backoff #3654

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Jul 11, 2024

Fixes #3523, #2818.

Now if there's a failure when obtaining the pg version OR schema cache, we do the same retrying process. This way we don't add two retries.

Refactors and renames the "connectionWorker" to "schemaCacheLoader". This makes more sense since what we really want is the schema cache, the version is the pre-requisite for ensuring our schema cache queries work.

Additionally, we no longer log Attempting to connect to the database... at startup unnecessarily. This is only logged whenever there's a retry attempt.

Fixes PostgREST#3523.

Now if there's a failure when obtaining the pg version OR schema cache,
we do the same retrying process. This way we don't add two retries.

Refactors and renames the "connectionWorker" to "schemaCacheLoader".
This makes more sense since what we really want is the schema cache,
the version is the pre-requisite for ensuring our
schema cache queries work.

Additionally, we no longer log ` Attempting to connect to the database...`
at startup unnecessarily. This is only logged whenever there's a retry attempt.
@@ -853,7 +853,7 @@ def test_metrics_include_schema_cache_fails(defaultenv, metapostgrest):
r'pgrst_schema_cache_loads_total{status="FAIL"} (\d+)', response.text
).group(1)
)
assert metrics > 3.0
assert metrics == 1.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test proves that this works. I didn't want to rely on the test mentioned in #3523 (comment) because it relies on a new feat and I want to backport this fix on v12.

Manually it can be confirmed that the above is fixed though.

$ PGRST_DB_SCHEMAS="x" postgrest-with-postgresql-15  -f test/spec/fixtures/load.sql postgrest-run

10/Jul/2024:20:56:36 -0500: Starting PostgREST 12.3 (pre-release)...
10/Jul/2024:20:56:36 -0500: Listening on port 3000
10/Jul/2024:20:56:36 -0500: Listening for notifications on the "pgrst" channel
10/Jul/2024:20:56:36 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:36 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}

10/Jul/2024:20:56:37 -0500: Attempting to reconnect to the database in 1 seconds...
10/Jul/2024:20:56:37 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:37 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}

10/Jul/2024:20:56:39 -0500: Attempting to reconnect to the database in 2 seconds...
10/Jul/2024:20:56:39 -0500: Successfully connected to PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
10/Jul/2024:20:56:39 -0500: Failed to load the schema cache. {"code":"3F000","details":null,"hint":null,"message":"schema \"x\" does not exist"}

@steve-chavez steve-chavez merged commit 6be5906 into PostgREST:main Jul 11, 2024
22 checks passed
@steve-chavez
Copy link
Member Author

Comment on lines +329 to +350
isConnEstablished :: AppState -> IO Bool
isConnEstablished appState = do
AppConfig{..} <- getConfig appState
if configDbChannelEnabled then
if configDbChannelEnabled then -- if the listener is enabled, we can be sure the connection is up
readIORef $ stateIsListenerOn appState
else
pure True
else -- otherwise the only way to check the connection is to make a query
isRight <$> usePool appState (SQL.sql "SELECT 1")

putIsListenerOn :: AppState -> Bool -> IO ()
putIsListenerOn = atomicWriteIORef . stateIsListenerOn

isConnEstablished :: AppState -> IO Bool
isConnEstablished x = do
conf <- getConfig x
if configDbChannelEnabled conf
then do -- if the listener is enabled, we can be sure the connection status is always up to date
st <- readIORef $ stateConnStatus x
return $ st == ConnEstablished
else -- otherwise the only way to check the connection is to make a query
isRight <$> usePool x (SQL.sql "SELECT 1")

isLoaded :: AppState -> IO Bool
isLoaded x = do
scacheStatus <- readIORef $ stateSCacheStatus x
connEstablished <- isConnEstablished x
listenerOn <- getIsListenerOn x
return $ scacheStatus == SCLoaded && connEstablished && listenerOn
return $ scacheStatus == SCLoaded && connEstablished

isPending :: AppState -> IO Bool
isPending x = do
scacheStatus <- readIORef $ stateSCacheStatus x
connStatus <- readIORef $ stateConnStatus x
listenerOn <- getIsListenerOn x
return $ scacheStatus == SCPending || connStatus == ConnPending || not listenerOn
connEstablished <- isConnEstablished x
return $ scacheStatus == SCPending || not connEstablished
Copy link
Member Author

@steve-chavez steve-chavez Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be this change of Admin conditions for the /ready state.

The logic should be the same.. but perhaps the previous one was faster. I cannot reproduce the CI failures locally for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

The schema cache load is not retried with exponential backoff
1 participant