perf: optimize pg describe tables query #147
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
describe tables query for postgres joined information schema views which is not efficient and specially notorious for large databases, instead use the catalog tables directly
Set up the environment
We will create a postgres 12 database and populate it with 200 tables with primary key
Using the following init.sql script:
First start a postgres server that initializes with init.sql
Proposal
Let's first test that both the current query and the new proposal produce the same result.
The current query uses several information schema views joined, the filters given don't allow for trimming results before joining, this is the current query:
The optimized query uses directly the catalog tables (built using the information schema view definition) where results are filtered before join
We can compare the results via the following command:
Now that we know that both produce the same result, let's check the performance
Performance analysis
First, the performance for the current query:
And the result for the new query:
Do note that if we create the database with 400 tables instead of 200, the results change greatly:
there is a big jump in the currrent query's performance but not for the new proposal