Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix queries on cache page #40

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 20 additions & 19 deletions _performance/004-Cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Cache
The typical rule for most applications is that only a fraction of its data is regularly accessed. As with many other things data can tend to follow the 80/20 rule with 20% of your data accounting for 80% of the reads and often times its higher than this. Postgres itself actually tracks access patterns of your data and will on its own keep frequently accessed data in cache. Generally you want your database to have a cache hit rate of about 99%. You can find your cache hit rate with:

SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
(sum(heap_blks_hit) - sum(heap_blks_read)) / sum(heap_blks_hit) as ratio

FROM
Expand All @@ -33,12 +33,12 @@ The other primary piece for improving performance is [indexes](<https://devcente
Indexes are most valuable across large tables as well. While accessing data from cache is faster than disk, even data within memory can be slow if Postgres must parse through hundreds of thousands of rows to identify if they meet a certain condition. To generate a list of your tables in your database with the largest ones first and the percentage of time which they use an index you can run:

SELECT
relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
n_live_tup rows_in_table
FROM
pg_stat_user_tables
WHERE
seq_scan + idx_scan \> 0
seq_scan + idx_scan > 0
ORDER BY
n_live_tup DESC;

Expand All @@ -50,21 +50,22 @@ Pro tip: If you're adding an index on a production database use `CREATE INDEX CO

Looking at a real world example of the recently launched Heroku dashboard, we can run this query and see our results:

SELECT relname,
100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
n_live_tup rows_in_table
FROM pg_stat_user_tables
SELECT relname,
100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
n_live_tup rows_in_table
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

relname | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
---------------------+-----------------------------+---------------
events | 0 | 669917
app_infos_user_info | 0 | 198218
app_infos_user_info | 0 | 198218
app_infos | 50 | 175640
user_info | 3 | 46718
rollouts | 0 | 34078 favorites | 0 | 3059
schema_migrations | 0 | 2
authorizations | 0 | 0
user_info | 3 | 46718
rollouts | 0 | 34078
favorites | 0 | 3059
schema_migrations | 0 | 2
authorizations | 0 | 0
delayed_jobs | 23 | 0

From this we can wee the events table which has around 700,000 rows has no indexes that have been used. From here you could investigate within my application and see some of the common queries that are used, one example is pulling the events for this blog post which you are reaching. You can see your [execution plan](<https://postgresguide.com/performance/explain.html?utm_source=referral&utm_medium=content&utm_campaign=craigkerstiens>) by running an [`EXPLAIN ANALYZE`](<https://postgresguide.com/performance/explain.html?utm_source=referral&utm_medium=content&utm_campaign=craigkerstiens>) which gives you can get a better idea of the performance of a specific query:
Expand All @@ -79,10 +80,10 @@ From this we can wee the events table which has around 700,000 rows has no index
Given there's a sequential scan across all that data this is an area we can optimize with an index. We can add our index concurrently to prevent locking on that table and then see how performance is:

CREATE INDEX CONCURRENTLY idx_events_app_info_id ON
events(app_info_id);
events(app_info_id);
EXPLAIN ANALYZE SELECT * FROM events WHERE app_info_id = 7559;

----------------------------------------------------------------------
----------------------------------------------------------------------
Index Scan using idx_events_app_info_id on events (cost=0.00..23.40 rows=38 width=688) (actual time=0.021..0.115 rows=89 loops=1)
: Index Cond: (app_info_id = 7559)

Expand All @@ -98,9 +99,9 @@ examine the results in [New Relic](https://elements.heroku.com/addons/newrelic)a
Finally to combine the two if you're interested in how many of your indexes are within your cache you can run:

SELECT
sum(idx_blks_read) as idx_read,
sum(idx_blks_hit) as idx_hit,
sum(idx_blks_hit) - sum(idx\_blks\_read)) sum(idx_blks_hit) as ratio
sum(idx_blks_read) as idx_read,
sum(idx_blks_hit) as idx_hit,
(sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
FROM
pg_statio_user_indexes;

Expand Down