craigkerstiens · mattrobenolt · Aug 3, 2015
diff --git a/_performance/004-Cache.md b/_performance/004-Cache.md
@@ -13,8 +13,8 @@ Cache
 The typical rule for most applications is that only a fraction of its data is regularly accessed. As with many other things data can tend to follow the 80/20 rule with 20% of your data accounting for 80% of the reads and often times its higher than this. Postgres itself actually tracks access patterns of your data and will on its own keep frequently accessed data in cache. Generally you want your database to have a cache hit rate of about 99%. You can find your cache hit rate with:
 
     SELECT
-           sum(heap_blks_read) as heap_read, 
-           sum(heap_blks_hit) as heap_hit, 
+           sum(heap_blks_read) as heap_read,
+           sum(heap_blks_hit) as heap_hit,
            (sum(heap_blks_hit) - sum(heap_blks_read)) / sum(heap_blks_hit) as ratio
 
     FROM
@@ -33,12 +33,12 @@ The other primary piece for improving performance is [indexes](<https://devcente
 Indexes are most valuable across large tables as well. While accessing data from cache is faster than disk, even data within memory can be slow if Postgres must parse through hundreds of thousands of rows to identify if they meet a certain condition. To generate a list of your tables in your database with the largest ones first and the percentage of time which they use an index you can run:
 
     SELECT
-          relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, 
+          relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
           n_live_tup rows_in_table
     FROM
          pg_stat_user_tables
     WHERE
-         seq_scan + idx_scan \> 0
+         seq_scan + idx_scan > 0
     ORDER BY
          n_live_tup DESC;
 
@@ -50,21 +50,22 @@ Pro tip: If you're adding an index on a production database use `CREATE INDEX CO
 
 Looking at a real world example of the recently launched Heroku dashboard, we can run this query and see our results:
 
-    SELECT relname, 
-           100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, 
-           n_live_tup rows_in_table 
-    FROM pg_stat_user_tables 
+    SELECT relname,
+           100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
+           n_live_tup rows_in_table
+    FROM pg_stat_user_tables
     ORDER BY n_live_tup DESC;
 
      relname              | percent_of_times_index_used | rows_in_table
-     ---------------------+-----------------------------+--------------- 
+     ---------------------+-----------------------------+---------------
      events               |             0               |       669917
-     app_infos_user_info  |             0               |       198218 
+     app_infos_user_info  |             0               |       198218
      app_infos            |            50               |       175640
-     user_info            |             3               |       46718 
-     rollouts             |             0               |       34078 favorites            |             0               |       3059
-     schema_migrations    |             0               |       2 
-     authorizations       |             0               |       0 
+     user_info            |             3               |       46718
+     rollouts             |             0               |       34078
+     favorites            |             0               |       3059
+     schema_migrations    |             0               |       2
+     authorizations       |             0               |       0
      delayed_jobs         |            23               |       0
 
 From this we can wee the events table which has around 700,000 rows has no indexes that have been used. From here you could investigate within my application and see some of the common queries that are used, one example is pulling the events for this blog post which you are reaching. You can see your [execution plan](<https://postgresguide.com/performance/explain.html?utm_source=referral&utm_medium=content&utm_campaign=craigkerstiens>) by running an [`EXPLAIN ANALYZE`](<https://postgresguide.com/performance/explain.html?utm_source=referral&utm_medium=content&utm_campaign=craigkerstiens>) which gives you can get a better idea of the performance of a specific query:
@@ -79,10 +80,10 @@ From this we can wee the events table which has around 700,000 rows has no index
 Given there's a sequential scan across all that data this is an area we can optimize with an index. We can add our index concurrently to prevent locking on that table and then see how performance is:
 
     CREATE INDEX CONCURRENTLY idx_events_app_info_id ON
-    events(app_info_id); 
+    events(app_info_id);
     EXPLAIN ANALYZE SELECT * FROM events WHERE app_info_id = 7559;
 
-    ---------------------------------------------------------------------- 
+    ----------------------------------------------------------------------
     Index Scan using idx_events_app_info_id on events (cost=0.00..23.40 rows=38 width=688) (actual time=0.021..0.115 rows=89 loops=1)
      :   Index Cond: (app_info_id = 7559)
 
@@ -98,9 +99,9 @@ examine the results in [New Relic](https://elements.heroku.com/addons/newrelic)a
 Finally to combine the two if you're interested in how many of your indexes are within your cache you can run:
 
     SELECT
-       sum(idx_blks_read) as idx_read, 
-       sum(idx_blks_hit) as idx_hit, 
-       sum(idx_blks_hit) - sum(idx\_blks\_read)) sum(idx_blks_hit) as ratio
+       sum(idx_blks_read) as idx_read,
+       sum(idx_blks_hit) as idx_hit,
+       (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
     FROM
          pg_statio_user_indexes;