RAS-850 Cloud sql proxy term timeout and verbose #72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What and why?
This PR adds a wait time for cloud_sql_proxy and stops verbose logging. Currently we deploy the app with a side car cloud_sql_proxy container to access the db. However when a pod is redeployed and the proxy container is sent a SIGTERM the container does not close down gracefully and will kill all connections instantly (i.e sql queries). This naturally gets passed back up the chain to the respondent who will get an error. With us currently using a default pool of 5 and overflow of 10, it means anything up to 15 respondents could get an error when we deploy a new release to prod, which ain't good.
It almost certainly is the reason we see hanging queries and spikes on rollouts in Grafana. This PR adds a term_timeout which will allow the proxy to complete it's query (at least give it 30 seconds to do so) before it creates a revision.
The PR also stops verbose logging which is set by default and why we get so many records in the logs we just don't need. The logging will still show start-up and errors which is fine for what we need
How to test?
Realistically you don't need to test this again if you have done it for this one see ONSdigital/ras-party#401 Naturally you can, just note the components to remove will be different.
Jira