-
Notifications
You must be signed in to change notification settings - Fork 0
/
PostgreSQL-Monitoring-Specification-v9.4.html
4272 lines (3703 loc) · 252 KB
/
PostgreSQL-Monitoring-Specification-v9.4.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<style>
body {
color:#000;font: arial, sans-serif;
}
h2 ~ *:not(h1):not(h2):not(pre) {
margin-left: 35px;
}
p, ul {
color:#000;font: 14px;
}
</style>
<script type="text/javascript">
window.onload = function () {
var toc = "";
var level = 0;
document.getElementById("contents").innerHTML =
document.getElementById("contents").innerHTML.replace(
/<h([\d])>([^<]+)<\/h([\d])>/gi,
function (str, openLevel, titleText, closeLevel) {
if (openLevel != closeLevel) {
return str;
}
if (openLevel > level) {
toc += (new Array(openLevel - level + 1)).join("<ul>");
} else if (openLevel < level) {
toc += (new Array(level - openLevel + 1)).join("</ul>");
}
level = parseInt(openLevel);
var anchor = titleText.replace(/ /g, "_");
toc += "<li><a href=\"#" + anchor + "\">" + titleText
+ "</a></li>";
return "<h" + openLevel + "><a name=\"" + anchor + "\">"
+ titleText + "</a></h" + closeLevel + ">";
}
);
if (level) {
toc += (new Array(level + 1)).join("</ul>");
}
document.getElementById("toc").innerHTML += toc;
};
</script>
<!--ENTER TITLE-->
<title>PostgreSQL Monitoring Specification - v9.4
</title>
<!--END TITLE-->
<h2>
</head>
<body>
<!--ENTER MAIN HEADER-->
<h1>PostgreSQL Monitoring Specification - v9.4
</h1>
<!--END MAIN HEADER-->
<!--ENTER COPYRIGHT NOTICE-->
<h3>© William Dunn ([email protected]) 2018
</h3>
<!--ENTER COPYRIGHT NOTICE-->
<div id="toc">
<h2>Table of Contents</h2>
</div>
<div id="contents">
<body>
<!--ENTER CONTENT BELOW-->
<h1>PostgreSQL Monitoring Specification v9.4</h1>
<h2>Overview</h2>
<p>This document provides a specification for the monitoring that should be implemented for a PostgreSQL 9.4 instance. The preferred monitoring tool of the PostgreSQL community is Bucardo's Check Postgres, but this specification provides a basis for implementation in another framework. It includes SQL queries and bash commands that should be implemented, as well as detailed explanations of how the parameters should be set and what actions should be taken when the parameter passes the threshold.</p>
<h2>Layout of this Document</h2>
<p>Important note: the results listed in "SQL Example Results" were not taken at the same time so they will not match up. This is the case even between the 'base' and 'Dashboard
' versions of the same monitor</p>
<h3>Overview</h3>
<ul><li>Commentary provided describing the monitor.</li></ul>
<h3>Default Frequency/Thresholds</h3>
<ul><li>Recommended thresholds for warning and critical statuses.</li></ul>
<h3>Alert Response</h3>
<ul><li>Suggested action that the DBA should take when that monitor triggers a warning.</li></ul>
<h3>bash Command</h3>
<ul><li>If the monitor uses a bash command - the command used is described here.</li></ul>
<h3>SQL Scope</h3>
<ul><li>If the monitor uses SQL - whether the query provides the details for all the databases in the cluster, or must be run for each database.</li></ul>
<h3>Base Query</h3>
<ul>
<li>If the monitor uses SQL - the SQL query that the monitor is based on. See the 'Dashboard
Query' section for details on the differences.</li>
<li>I kept this copy of the base query separate from the query used by Dashboard
so it can later extended to specific monitoring tools which may have fewer limitations listed in the Dashboard Query section.</li>
</ul>
<h4>SQL</h4>
<ul>
<li>If the monitor uses SQL - the SQL query used by the Base Query.</li>
<li>Note this may be cluster wide OR may need be executed on each database, as specified in the 'SQL Scope'.</li>
</ul>
<h4>SQL Example Results</h4>
<ul>
<li>If the monitor uses SQL - example results of the SQL query used by the monitor.</li>
<li>Note that the query results were not collected at the same time nor under the same conditions so they may not be consistent.</li>
<li>There may be multiple 'SQL Example Results' sections for a query to show what results will look like </li>
</ul>
<h4>SQL Example Results Pretty</h4>
<ul><li>For queries with many rows we executed the same query as 'SQL Example Results' except the results are the 'expanded table formatting mode' to make it easier to read (psql '\x' command).</li></ul>
<h3>Dashboard Query</h3>
<ul><li>This is based on the Base Query but for a typical Dashboard application. The base query has been modified to support the limited fields monitoring applications typically have:</li></ul>
<ul>
<li>Combine multiple fields into a single 'Comments' field to be used as the body of a ticket or email alert which monitoring applications typically make.</li>
<li>The fields used by percentages in the base query were multiplied by 100 so they are a fraction of 100 rather than a fraction of 1.</li>
<ul><li>For example, 90.2% will be displayed as .902 in the base query, 90.02 in the Dashboard
query.</li></ul>
</ul>
<h4>SQL</h4>
<ul><li>The Base Query with the modifications made for Dashboard
described above.</li></ul>
<h4>SQL Example Results</h4>
<ul><li>Same as Base Query except Dashboard
Query was executed.</li></ul>
<h4>SQL Example Results Pretty</h4>
<ul><li>Same as Base Query except Dashboard
Query was executed.</li></ul>
<h1>Autovacuum</h1>
<h2>perc_over_autovacuum_freeze_max_age/perc_until_autovacuum_freeze_max_age/perc_until_xid_wraparound</h2>
<h3>Overview</h3>
<p>PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is "in the future" and should not be visible to the current transaction. Normal XIDs are compared using modulo-2^32 arithmetic, which means that ~2^32/2-1 transactions appear in the future and ~2^32/2-1 transactions appear in the past. To prevent XID wraparound the PostgreSQL Autovacuum worker will add a marker to the XIDs of old transactions which mark them as "frozen" and cause them to always be evaluated as older than any running transaction.</p>
<p>This behavior of autovacuum is primarily dependent on the settings autovacuum_freeze_table_age and autovacuum_freeze_max_age, which are set as database defaults but can also be specified on a per table basis (as storage parameters in CREATE TABLE or ALTER TABLE)</p>
<ul>
<li>When a table's oldest transaction reaches autovacuum_freeze_table_age, the next autovacuum that is performed on that table will be a vacuum freeze</li>
<ul><li>PostgreSQL implicitly caps autovacuum_freeze_table_age at 0.95*autovacuum_freeze_max_age.</li></ul>
<li>When a table reaches autovacuum_freeze_max_age PostgreSQL will force an autovacuum freeze on that table, even if the table would not otherwise be autovacuumed or autovacuum is disabled.</li>
<ul><li>PostgreSQL implicitly caps autovacuum_freeze_max_age at 2 billion (2000000000)</li></ul>
</ul>
<p>The actual age that a wraparound occurs is ((2^32)/2)-1. When a PostgreSQL database comes within 1 million of this age (2^32/2-1-1000000) the database will go into the safety shutdown mode" and no longer accept commands, including the vacuum commands, and your only recovery option is to stop the server and use a single-user backend (where shutdown mode is not enforced) to execute VACUUM. This should, obviously, be avoided at all costs.</p>
<p>References:</p>
<ul>
<li>http://www.PostgreSQL.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND</li>
<li>http://www.PostgreSQL.org/docs/current/static/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE</li>
<li>http://www.PostgreSQL.org/docs/devel/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE</li>
</ul>
<h3>perc_over_autovacuum_freeze_max_age</h3>
<h4>Overview</h4>
<p>As described above, when a table reaches autovacuum_freeze_max_age PostgreSQL will force an autovacuum freeze on that table, even if the table would not otherwise be autovacuumed or autovacuum is disabled. For each table we divide the age of the oldest transaction / the autovacuum_freeze_max_age. This number should never exceed 1 (100%) (or exceed it by much) because when 1 is reached (age of the oldest transaction = the autovacuum_freeze_max_age) an autovacuum freeze operation should be forced on this table. If this value far exceeds 1 something is wrong with autovacuum or there is something preventing the vacuum freeze operation from completing. Monitoring and addressing this is critical to ensure that transaction wraparound is being prevented by the "transaction id freeze' operation.</p>
<p>Note that in our measurement of autovacuum_freeze_max_age we cap it at 2 billion (because PostgreSQL implicitly caps it at that value) so the value reported by the monitor may be different (2 billion) than that listed in the config of the config lists a value greater than 2 billion.</p>
<h4>Default Frequency/Thresholds</h4>
<ul>
<li><b>Frequency: </b>twice a day to start with. If a customer database has a high volume of transactions maybe go more frequent</li>
<li><b>Warning: </b>none</li>
<li><b>Critical: </b>1.01 (101%)</li>
</ul>
<h4>Alert Response</h4>
<p><b>Warning</b></p>
<ul>
<li>None</li>
</ul>
<p><b>Critical</b></p>
<ol>
<li>Run vacuum freeze at next slow period</li>
<ul>
<li>On the problem table(s)</li>
<li>If other tables are also past autovacuum_freeze_table_age perform vacuum freeze on the full database</li>
<li>Syntax for vaccum freeze:</li>
<ul>
<li><code>VACUUM (FREEZE, VERBOSE)</code> <b>schema.table_name</b> <code>;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
</ul>
<li>Check the logs for any error in the vacuum freeze operation (both the one you just ran and previously)</li>
<li>Review vacuum settings and make sure it is running and make it more aggressive on the cluster, at least for the freeze settings</li>
<ul>
<li>Note: freeze settings can be set per database, but that is more difficult to train first level support on</li>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
<li>To modify individual tables:</li>
<ul>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>SET (</code><b>autovacuum_freeze_max_age</b><code> = </code><b>value</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp SET (autovacuum_freeze_max_age=180000000);</code></li></ul>
</ul>
<li>After altering the table execute the query to check the specific settings again to confirm they changed to the desired values</li>
<li>If any table needs to be changed back to the server default setting for a storage_parameter use the ALTER TABLE RESET command:</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>RESET (</code><b>storage_parameter</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp RESET (autovacuum_freeze_max_age);</code></li></ul>
</ul>
<li><b>Alter Table Reference: </b>http://www.postgresql.org/docs/current/static/sql-altertable.html</li>
<li><b>Autovacuum Settings Reference: </b>http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html</li>
</ul>
</ul>
</ol>
<h3>perc_until_autovacuum_freeze_max_age_10kRows</h3>
<h4>Overview</h4>
<p>As described above, when a table reaches autovacuum_freeze_max_age PostgreSQL will force an autovacuum freeze on that table, even if the table would not otherwise be autovacuumed or autovacuum is disabled. This operation acts on the table's entire visibility map so for very large tables (tables with more than 10,000 rows) this can use a lot of system resources and may impact performance, especially during peak load. For each large table (more than 10,000 rows) we divide the age of the oldest transaction / the autovacuum_freeze_max_age, and for tables smaller than 10,000 rows we return 0. Note that the number of rows is important, not the amount of data, because the visibility map size contains only high level row information such as ID and XID. Monitoring and addressing this is allows the DBA to manually perform a vacuum freeze operation on the table(s) during a slow period to prevent it from occurring during peak load.</p>
<p>Note that in our measurement of autovacuum_freeze_max_age we cap it at 2 billion (because PostgreSQL implicitly caps it at that value) so the value reported by the monitor may be different (2 billion) than that listed in the config of the config lists a value greater than 2 billion.</p>
<h4>Default Frequency/Thresholds</h4>
<ul>
<li><b>Frequency: </b>twice a day to start with. If a customer database has a high volume of transactions maybe go more frequent</li>
<li><b>Warning: </b>none</li>
<li><b>Critical: </b>.95 (95%)</li>
</ul>
<h4>Alert Response</h4>
<p><b>Warning</b></p>
<ul>
<li>None</li>
</ul>
<p><b>Critical</b></p>
<ul>
<li>Run vacuum freeze at next slow period</li>
<ul>
<li>On the problem table(s)</li>
<li>If other tables are also past autovacuum_freeze_table_age perform vacuum freeze on those tables or the full database</li>
<li>Syntax for vaccum freeze:</li>
<ul>
<li><code>VACUUM (FREEZE, VERBOSE)</code> <b>schema.table_name</b> <code>;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
</ul>
</ul>
<p><b>Trending</b></p>
<ul>
<li>If this is occurring frequently review vacuum settings and tune it to be more aggressive. It can be tuned for the entire cluster and for individual tables</li>
<ul>
<li>Note: freeze settings can be set per database, but that is more difficult to train first level support on</li>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
<li>To modify individual tables:</li>
<ul>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>SET (</code><b>autovacuum_freeze_max_age</b><code> = </code><b>value</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp SET (autovacuum_freeze_max_age=180000000);</code></li></ul>
</ul>
<li>After altering the table execute the query to check the specific settings again to confirm they changed to the desired values</li>
<li>If any table needs to be changed back to the server default setting for a storage_parameter use the ALTER TABLE RESET command:</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>RESET (</code><b>storage_parameter</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp RESET (autovacuum_freeze_max_age);</code></li></ul>
</ul>
<li><b>Alter Table Reference: </b>http://www.postgresql.org/docs/current/static/sql-altertable.html</li>
<li><b>Autovacuum Settings Reference: </b>http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html</li>
</ul>
</ul>
</ul>
<h3>perc_until_wraparound_server_freeze</h3>
<h4>Overview</h4>
<p>As described above, the actual age that a wraparound occurs is ((2^32)/2)-1. When a PostgreSQL database comes within 1 million of this age (2^32/2-1-1000000) the database will go into the safety shutdown mode" and no longer accept commands, including the vacuum commands, and your only recovery option is to stop the server and use a single-user backend (where shutdown mode is not enforced) to execute VACUUM. For each table we divide the age of the oldest transaction / (2^32/2-1-1000000). This should never occur because once autovacuum_freeze_max_age is reached a vacuum freeze should be triggered automatically (even if the table would not otherwise be autovacuumed or autovacuum is disabled) but we monitor this anyway because preventing a wraparound freeze is so critical.</p>
<h4>Default Frequency/Thresholds</h4>
<ul>
<li><b>Frequency: </b>twice a day to start with. If a customer database has a high volume of transactions maybe go more frequent</li>
<li><b>Warning: </b>.6 (60%)</li>
<ul><li>Must be set higher if autovacuum_freeze_max_age exceeds 1200000000 because autovacuum freeze should not be forced until that value is reached</li></ul>
<li><b>Critical: </b>.65 (65%)</li>
<ul><li>Must be set higher if autovacuum_freeze_max_age exceeds 1200000000 because autovacuum freeze should not be forced until that value is reached</li></ul>
</ul>
<h4>Alert Response</h4>
<p><b>Warning</b></p>
<ol>
<li>Manually run a database wide vacuum freeze at next slow period</li>
<li>Syntax for database vaccum freeze:</li>
<ul>
<li><code>VACUUM FREEZE VERBOSE;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
<li>Check the logs for any error in the vacuum freeze operation (both the one you just ran and previously)</li>
<li>Review vacuum settings and make sure it is running and make it more aggressive on the cluster, at least for the freeze settings</li>
<ul>
<li>Note: freeze settings can be set per database, but that is more difficult to train first level support on</li>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
</ul>
</ol>
<p><b>Critical</b></p>
<ol>
<li>Force a database wide vacuum as soon as can get scheduled</li>
<li>Syntax for database vaccum freeze:</li>
<ul>
<li><code>VACUUM FREEZE VERBOSE;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
<li>Check the logs for any error in the vacuum freeze operation (both the one you just ran and previously)</li>
<li>Check the autovacuum_freeze_max_age settings (both in the config and for the individual tables) and decrease if it is too high</li>
<ul>
<li>Check database/server settings:</li>
<ul>
<li><b>autovacuum_freeze_max_age: </b><code>SELECT current_setting('autovacuum_freeze_max_age');</code></li>
<li><b>vacuum_freeze_table_age: </b><code>SELECT current_setting('vacuum_freeze_table_age');</code></li>
</ul>
<li>To modify the server settings in the postgresql.conf and reload the config</li>
<li>Check individual table settings:</li>
<pre>
SELECT current_catalog AS database,
storage_settings.nspname AS schema,
storage_settings.relname AS table,
least(autovacuum_freeze_table_age,(0.95*(least(autovacuum_freeze_max_age, 2000000000)))) AS autovacuum_freeze_table_age,
uses_default_autovacuum_freeze_table_age AS uses_default,
CASE
WHEN autovacuum_freeze_table_age>(0.95*(least(autovacuum_freeze_max_age, 2000000000))) THEN TRUE
ELSE FALSE
END AS exceeds_max,
least(autovacuum_freeze_max_age, 2000000000) AS autovacuum_freeze_max_age,
uses_default_autovacuum_freeze_max_age AS uses_default,
CASE
WHEN autovacuum_freeze_max_age>2000000000 THEN TRUE
ELSE FALSE
END AS exceeds_max
FROM
(SELECT oid,
nspname,
relname,
CASE
WHEN relopts LIKE '%autovacuum_freeze_table_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_table_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('vacuum_freeze_table_age') AS real)
END AS autovacuum_freeze_table_age,
CASE
WHEN relopts LIKE '%autovacuum_freeze_table_age%' THEN FALSE
ELSE TRUE
END AS uses_default_autovacuum_freeze_table_age,
CASE
WHEN relopts LIKE '%autovacuum_freeze_max_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_max_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('autovacuum_freeze_max_age') AS real)
END AS autovacuum_freeze_max_age,
CASE
WHEN relopts LIKE '%autovacuum_freeze_max_age%' THEN FALSE
ELSE TRUE
END AS uses_default_autovacuum_freeze_max_age
FROM
(SELECT pg_class.oid,
relname,
nspname,
array_to_string(reloptions, '') AS relopts
FROM pg_class
INNER JOIN pg_namespace ns ON relnamespace = ns.oid) table_opts) storage_settings;
</pre>
<li>To modify individual tables:</li>
<ul>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>SET (</code><b>autovacuum_freeze_max_age</b><code> = </code><b>value</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp SET (autovacuum_freeze_max_age=180000000);</code></li></ul>
</ul>
<li>After altering the table execute the query to check the specific settings again to confirm they changed to the desired values</li>
<li>If any table needs to be changed back to the server default setting for a storage_parameter use the ALTER TABLE RESET command:</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>RESET (</code><b>storage_parameter</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp RESET (autovacuum_freeze_max_age);</code></li></ul>
</ul>
<li><b>Alter Table Reference: </b>http://www.postgresql.org/docs/current/static/sql-altertable.html</li>
<li><b>Autovacuum Settings Reference: </b>http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html</li>
</ul>
</ul>
<li>Review vacuum settings and make sure it is running and make it more aggressive on the cluster, at least for the freeze settings</li>
<ul>
<li>Note: freeze settings can be set per database, but that is more difficult to train first level support on</li>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
</ul>
</ol>
<h3>SQL Scope</h3>
<p>Must be run for each database.</p>
<h3>Base Query</h3>
<h4>SQL</h4>
<pre>
SELECT current_catalog AS database,
storage_settings.nspname AS schema,
storage_settings.relname AS table,
(pg_relation_size(pg_class.oid)) AS table_bytes,
pg_class.relpages AS pages_on_disk,
n_live_tup+n_dead_tup AS disk_row_count,
age(relfrozenxid) AS relfrozenxid_age,
least(autovacuum_freeze_table_age,(0.95*(least(autovacuum_freeze_max_age, 2000000000)))) AS autovacuum_freeze_table_age,
CAST (age(relfrozenxid) AS real) / CAST (autovacuum_freeze_table_age AS real) AS perc_until_vacuum_freeze_table_age ,
CASE
WHEN age(relfrozenxid) > autovacuum_freeze_table_age THEN TRUE
ELSE FALSE
END AS next_autovacuum_will_be_a_freeze,
(least(autovacuum_freeze_max_age, 2000000000)) AS autovacuum_freeze_max_age,
CAST (age(relfrozenxid) AS real) / CAST ((least(autovacuum_freeze_max_age, 2000000000)) AS real) AS perc_until_freeze_max_age,
CASE
WHEN n_live_tup+n_dead_tup > 10000 THEN CAST(age(relfrozenxid) AS real) / CAST((least(autovacuum_freeze_max_age, 2000000000)) AS real)
ELSE 0
END AS perc_until_freeze_max_age_10kRows,
trunc(((2^32)/2)-1-1000000) AS wraparound_server_freeze_age,
CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-1000000) AS real) AS perc_until_wraparound_server_freeze
FROM pg_stat_user_tables
INNER JOIN pg_class ON pg_stat_user_tables.relid = pg_class.oid
INNER JOIN
(SELECT oid,
relname,
nspname,
CASE
WHEN relopts LIKE '%autovacuum_freeze_table_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_table_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('vacuum_freeze_table_age') AS real)
END AS autovacuum_freeze_table_age,
CASE
WHEN relopts LIKE '%autovacuum_freeze_max_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_max_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('autovacuum_freeze_max_age') AS real)
END AS autovacuum_freeze_max_age
FROM
(SELECT pg_class.oid,
relname,
nspname,
array_to_string(reloptions, '') AS relopts
FROM pg_class
INNER JOIN pg_namespace ns ON relnamespace = ns.oid) table_opts) storage_settings ON (pg_class.oid = storage_settings.oid)
ORDER BY storage_settings.relname;
</pre>
<h4>SQL Example Result</h4>
<pre>
database | schema | table | table_bytes | pages_on_disk | disk_row_count | relfrozenxid_age | autovacuum_freeze_table_age | perc_until_vacuum_freeze_table_age | next_autovacuum_will_be_a_freeze | autovacuum_freeze_max_age | perc_until_freeze_max_age | perc_until_freeze_max_age_10krows | wraparound_server_freeze_age | perc_until_wraparound_server_freeze
----------+----------+------------+-------------+---------------+----------------+------------------+-----------------------------+------------------------------------+----------------------------------+---------------------------+---------------------------+-----------------------------------+------------------------------+-------------------------------------
pgp | edbstore | categories | 8192 | 1 | 16 | 712 | 150000000 | 4.74667e-06 | f | 2e+08 | 3.56e-06 | 0 | 2146483647 | 3.31705e-07
pgp | edbstore | cust_hist | 2678784 | 327 | 60350 | 707 | 150000000 | 4.71333e-06 | f | 2e+08 | 3.535e-06 | 3.535e-06 | 2146483647 | 3.29376e-07
pgp | edbstore | customers | 3997696 | 488 | 20000 | 705 | 150000000 | 4.7e-06 | f | 2e+08 | 3.525e-06 | 3.525e-06 | 2146483647 | 3.28444e-07
pgp | edbstore | dept | 8192 | 1 | 4 | 700 | 150000000 | 4.66667e-06 | f | 2e+08 | 3.5e-06 | 0 | 2146483647 | 3.26115e-07
pgp | edbstore | emp | 8192 | 1 | 14 | 698 | 150000000 | 4.65333e-06 | f | 2e+08 | 3.49e-06 | 0 | 2146483647 | 3.25183e-07
pgp | edbstore | inventory | 450560 | 55 | 10000 | 696 | 150000000 | 4.64e-06 | f | 2e+08 | 3.48e-06 | 0 | 2146483647 | 3.24251e-07
pgp | edbstore | job_grd | 8192 | 1 | 4 | 694 | 150000000 | 4.62667e-06 | f | 2e+08 | 3.47e-06 | 0 | 2146483647 | 3.2332e-07
pgp | edbstore | jobhist | 8192 | 1 | 17 | 692 | 150000000 | 4.61333e-06 | f | 2e+08 | 3.46e-06 | 0 | 2146483647 | 3.22388e-07
pgp | edbstore | locations | 0 | 0 | 0 | 690 | 150000000 | 4.6e-06 | f | 2e+08 | 3.45e-06 | 0 | 2146483647 | 3.21456e-07
pgp | edbstore | orderlines | 3153920 | 385 | 60350 | 686 | 150000000 | 4.57333e-06 | f | 2e+08 | 3.43e-06 | 3.43e-06 | 2146483647 | 3.19592e-07
pgp | edbstore | orders | 819200 | 100 | 12000 | 684 | 150000000 | 4.56e-06 | f | 2e+08 | 3.42e-06 | 3.42e-06 | 2146483647 | 3.18661e-07
pgp | edbstore | products | 827392 | 101 | 10000 | 679 | 150000000 | 4.52667e-06 | f | 2e+08 | 3.395e-06 | 0 | 2146483647 | 3.16331e-07
pgp | edbstore | reorder | 0 | 0 | 0 | 674 | 150000000 | 4.49333e-06 | f | 2e+08 | 3.37e-06 | 0 | 2146483647 | 3.14002e-07
(13 rows)
</pre>
<h3>Dashboard
Query</h3>
<h4>SQL</h4>
<pre>
SELECT base.database,
base.schema,
base.table,
base.perc_until_freeze_max_age_10krows,
'autovacuum_freeze_max_age: ' || base.autovacuum_freeze_max_age || '\n' || 'relfrozenxid_age: ' || base.relfrozenxid_age AS perc_until_freeze_max_age_10krows_details,
base.perc_until_freeze_max_age,
'autovacuum_freeze_max_age: ' || base.autovacuum_freeze_max_age || '\n' || 'relfrozenxid_age: ' || base.relfrozenxid_age AS perc_until_freeze_max_age_details,
base.perc_until_wraparound_server_freeze,
'wraparound_server_freeze_age: ' || base.wraparound_server_freeze_age || '\n' || 'relfrozenxid_age: ' || base.relfrozenxid_age AS perc_until_wraparound_server_freeze_details
FROM
(SELECT current_catalog AS database,
storage_settings.nspname AS schema,
storage_settings.relname AS table,
(pg_relation_size(pg_class.oid)) AS table_bytes,
pg_class.relpages AS pages_on_disk,
n_live_tup+n_dead_tup AS disk_row_count,
age(relfrozenxid) AS relfrozenxid_age,
least(autovacuum_freeze_table_age,(0.95*(least(autovacuum_freeze_max_age, 2000000000)))) AS autovacuum_freeze_table_age,
100.00 * (CAST (age(relfrozenxid) AS real) / CAST (autovacuum_freeze_table_age AS real)) AS perc_until_vacuum_freeze_table_age ,
CASE
WHEN age(relfrozenxid) > autovacuum_freeze_table_age THEN TRUE
ELSE FALSE
END AS next_autovacuum_will_be_a_freeze,
least(autovacuum_freeze_max_age, 2000000000) AS autovacuum_freeze_max_age,
100.00 * (CAST (age(relfrozenxid) AS real) / CAST ((least(autovacuum_freeze_max_age, 2000000000)) AS real)) AS perc_until_freeze_max_age,
CASE
WHEN n_live_tup+n_dead_tup > 10000 THEN 100.00 * (CAST(age(relfrozenxid) AS real) / CAST((least(autovacuum_freeze_max_age, 2000000000)) AS real))
ELSE 0
END AS perc_until_freeze_max_age_10kRows,
trunc(((2^32)/2)-1-1000000) AS wraparound_server_freeze_age,
100.00 * (CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-1000000) AS real)) AS perc_until_wraparound_server_freeze
FROM pg_stat_user_tables
INNER JOIN pg_class ON pg_stat_user_tables.relid = pg_class.oid
INNER JOIN
(SELECT oid,
relname,
nspname,
CASE
WHEN relopts LIKE '%autovacuum_freeze_table_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_table_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('vacuum_freeze_table_age') AS real)
END AS autovacuum_freeze_table_age,
CASE
WHEN relopts LIKE '%autovacuum_freeze_max_age%' THEN CAST (regexp_replace(relopts, '.*autovacuum_freeze_max_age=([0-9.]+).*', E'\\1') AS real)
ELSE CAST (current_setting('autovacuum_freeze_max_age') AS real)
END AS autovacuum_freeze_max_age
FROM
(SELECT pg_class.oid,
relname,
nspname,
array_to_string(reloptions, '') AS relopts
FROM pg_class
INNER JOIN pg_namespace ns ON relnamespace = ns.oid) table_opts) storage_settings ON (pg_class.oid = storage_settings.oid)
ORDER BY storage_settings.relname) base;
</pre>
<h4>SQL Example Results</h4>
<pre>
database | schema | table | perc_until_freeze_max_age_10krows | perc_until_freeze_max_age_10krows_details | perc_until_freeze_max_age | perc_until_freeze_max_age_details | perc_until_wraparound_server_freeze | perc_until_wraparound_server_freeze_details
----------+----------+------------+-----------------------------------+---------------------------------------------------------+---------------------------+---------------------------------------------------------+-------------------------------------+-----------------------------------------------------------------
pgp | edbstore | categories | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 712 | 3.56e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 712 | 3.31705e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 712
pgp | edbstore | cust_hist | 3.535e-06 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 707 | 3.535e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 707 | 3.29376e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 707
pgp | edbstore | customers | 3.525e-06 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 705 | 3.525e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 705 | 3.28444e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 705
pgp | edbstore | dept | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 700 | 3.5e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 700 | 3.26115e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 700
pgp | edbstore | emp | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 698 | 3.49e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 698 | 3.25183e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 698
pgp | edbstore | inventory | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 696 | 3.48e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 696 | 3.24251e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 696
pgp | edbstore | job_grd | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 694 | 3.47e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 694 | 3.2332e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 694
pgp | edbstore | jobhist | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 692 | 3.46e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 692 | 3.22388e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 692
pgp | edbstore | locations | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 690 | 3.45e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 690 | 3.21456e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 690
pgp | edbstore | orderlines | 3.43e-06 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 686 | 3.43e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 686 | 3.19592e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 686
pgp | edbstore | orders | 3.42e-06 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 684 | 3.42e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 684 | 3.18661e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 684
pgp | edbstore | products | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 679 | 3.395e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 679 | 3.16331e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 679
pgp | edbstore | reorder | 0 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 674 | 3.37e-04 | autovacuum_freeze_max_age: 2e+08\nrelfrozenxid_age: 674 | 3.14002e-05 | wraparound_server_freeze_age: 2146483647\nrelfrozenxid_age: 674
(13 rows)
</pre>
<h2>user_table_bloat</h2>
<h3>Overview</h3>
<p>The PostgreSQL autovacuum demon performs several critical functions</p>
<ul>
<li>It removes "dead" updated/deleted rows which are no longer needed for MVCC (and thus useless). Rows that are no longer needed for MVCC but remain in the table are referred to as "bloat"</li>
<ul>
<li>All the dead rows must be scanned during queries even though they are no longer needed, so removing them decreases the processing impact of having them remain there</li>
<li>All dead rows remain on disk, so removing them allows disk space to be recovered or reused</li>
</ul>
<li>It updates data statistics used by the PostgreSQL query planner</li>
<li>It updates the visibility map, which speeds up index-only scans</li>
<li>It freezes old transaction ids to avoid transaction ID wraparound or multixact ID wraparound (as described and monitored by perc_over_autovacuum_freeze_max_age/perc_until_autovacuum_freeze_max_age/perc_until_xid_wraparound)</li>
</ul>
<p>To measure the performance of autovacuum to ensure that it is running and tuned to be aggressive enough we calculate the "bloat" for each table. If autovacuum is maintaining healthy levels of bloat it is safe to conclude that it is also maintaining statistics and the visibility map as well. To estimate table bloat for each table we divide the (live rows + dead rows)/(live rows)-1. Autovacuum will not vacuum a table until there are at least 'autovacuum_vacuum_threshold' dead rows as specified in postgresql.conf (the default is 50), so we put a threshold that the table must be at least autovacuum_vacuum_threshold*10 in size which will make us have no false positives above 0.1 (10%) bloat.</p>
<p>References:</p>
<ul>
<li>http://www.postgresql.org/docs/devel/static/routine-vacuuming.html</li>
<li>http://www.PostgreSQL.org/docs/devel/static/runtime-config-autovacuum.html</li>
</ul>
<h3>Default Frequency/Thresholds</h3>
<ul>
<li><b>Frequency: </b>once a day, at the end of slow period (~7 PM)</li>
<li><b>Warning: </b>.15 (15%)</li>
<li><b>Critical: </b>.2 (20%)</li>
</ul>
<h3>Alert Response</h3>
<p><b>Warning</b></p>
<ol>
<li>Run vacuum against the table(s) at next slow period</li>
<ul>
<li>Syntax for vaccum:</li>
<ul>
<li><code>VACUUM (VERBOSE)</code> <b>schema.table_name</b> <code>;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
</ul>
<li>If there are other tables nearing warning level run a vacuum against them as well</li>
</ol>
<p><b>Critical</b></p>
<ol>
<li>Run vacuum against the table(s) immediately</li>
<ul>
<li>Syntax for vaccum:</li>
<ul>
<li><code>VACUUM (VERBOSE)</code> <b>schema.table_name</b> <code>;</code></li>
<li><b>Reference: </b>http://www.postgresql.org/docs/current/static/sql-vacuum.html</li>
</ul>
</ul>
<li>If there are many tables reaching warning/critical tune the server autovacuum settings to make autovacuum more aggressive. </li>
<ul><li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li></ul>
<li>If there is one or two tables reaching warning/critical set specific autovacuum settings on the table(s) to make it more aggressive</li>
<ul>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
<li>The following query will show you table specific settings (if there are any):</li>
<pre>
SELECT pg_class.oid,
nspname AS schema,
relname AS table,
array_to_string(reloptions, '') AS custom_settings
FROM pg_class
INNER JOIN pg_namespace ON (pg_namespace.oid=pg_class.relnamespace)
WHERE nspname='[SCHEMA NAME]'
AND relname='[TABLE NAME]';
</pre>
<ul>
<li>Replace <code>[SCHEMA NAME]</code> with the name of the schema the table is in</li>
<li>Replace <code>[TABLE NAME]</code> with the name of the table</li>
</ul>
<li>Syntax for changing settings for an individual table</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>SET (</code><b>storage_parameter</b><code> = </code><b>value</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp SET (autovacuum_vacuum_scale_factor=.5);</code></li></ul>
</ul>
<li>After altering the table execute the query to check the specific settings again to confirm they changed to the desired values</li>
<li>If any table needs to be changed back to the server default setting for a storage_parameter use the ALTER TABLE RESET command:</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>RESET (</code><b>storage_parameter</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp RESET (autovacuum_vacuum_scale_factor);</code></li></ul>
</ul>
<li><b>Alter Table Reference: </b>http://www.postgresql.org/docs/current/static/sql-altertable.html</li>
<li><b>Autovacuum Settings Reference: </b>http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html</li>
</ul>
</ol>
<p><b>Trending</b></p>
<ol>
<li>If warning is triggered frequently for many tables tune the server-wide autovacuum settings in the config to make autovacuum more aggressive and ensure no table has individual storage parameters less aggressive than those values</li>
<ul><li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li></ul>
<li>If warning triggered frequently on on or a few tables set specific autovacuum settings on the table(s) (ALTER TABLE SET ( storage_parameter = value [, ... ] )) to make it more aggressive</li>
<ul>
<li>See the 'autovacuum' section of guide "PostgreSQL Server Tuning" for instructions</li>
<li>The following query will show you table specific settings (if there are any):</li>
<pre>
SELECT pg_class.oid,
nspname AS schema,
relname AS table,
array_to_string(reloptions, '') AS custom_settings
FROM pg_class
INNER JOIN pg_namespace ON (pg_namespace.oid=pg_class.relnamespace)
WHERE nspname='[SCHEMA NAME]'
AND relname='[TABLE NAME]';
</pre>
<ul>
<li>Replace <code>[SCHEMA NAME]</code> with the name of the schema the table is in</li>
<li>Replace <code>[TABLE NAME]</code> with the name of the table</li>
</ul>
<li>Syntax for changing settings for an individual table</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>SET (</code><b>storage_parameter</b><code> = </code><b>value</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp SET (autovacuum_vacuum_scale_factor=.5);</code></li></ul>
</ul>
<li>After altering the table execute the query to check the specific settings again to confirm they changed to the desired values</li>
<li>If any table needs to be changed back to the server default setting for a storage_parameter use the ALTER TABLE RESET command:</li>
<ul>
<li><code>ALTER TABLE</code> <b>schema.table</b> <code>RESET (</code><b>storage_parameter</b><code>);</code></li>
<ul><li>For example: <code>ALTER TABLE edbstore.emp RESET (autovacuum_vacuum_scale_factor);</code></li></ul>
</ul>
<li><b>Alter Table Reference: </b>http://www.postgresql.org/docs/current/static/sql-altertable.html</li>
<li><b>Autovacuum Settings Reference: </b>http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html</li>
</ul>
</ol>
<p><b>Health Check: </b></p>
<ul>
<li>If we are regularly monitoring the server and the above Warning/Critical alert responses are followed the level of bloat will never become overwhelming. However, a server that is not monitored or administrated by us may have an extremely high level of bloat or autovacuum might be off altogether. In these cases you may need to take an outage or wait for the next scheduled outage and run a vacuum of all tables (full database vacuum)</li>
</ul>
<h3>SQL Scope</h3>
<p>Must be run once per database.</p>
<h3>Base Query</h3>
<h4>SQL</h4>
<pre>
/*
if you remove the WHERE clause you will need to add logic to the bloat
calculation to avoid division by 0 error that would occur when n_live_tup=0.
For example:
CASE WHEN n_live_tup>0 THEN ((n_live_tup::float+n_dead_tup)/n_live_tup)-1
ELSE 0 END AS bloat
*/
SELECT current_catalog AS database,
schemaname AS schema,
relname AS table,
(pg_relation_size(relid)) AS table_bytes,
n_live_tup,
n_dead_tup,
((CAST(n_live_tup AS real) + n_dead_tup)/n_live_tup)-1 AS bloat
FROM pg_stat_user_tables
WHERE n_live_tup>(CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)*10+1);
</pre>
<h4>SQL Example Results</h4>
<pre>
database | schema | table | table_bytes | n_live_tup | n_dead_tup | bloat
----------+----------+------------+-------------+------------+------------+------------------
pgp | edbstore | orders | 819200 | 12000 | 0 | 0
pgp | edbstore | products | 827392 | 10000 | 0 | 0
pgp | edbstore | customers | 3997696 | 20000 | 0 | 0
pgp | edbstore | inventory | 540672 | 10000 | 2102 | 0.2102
pgp | edbstore | cust_hist | 2678784 | 60350 | 0 | 0
pgp | edbstore | orderlines | 3153920 | 60350 | 0 | 0
pgp |edbstore | emp | 450560 | 14000 | 5000 | 0.35714285714286
(7 rows)
</pre>
<h4>SQL Example After Vacuum</h4>
<pre>
database | schema | table | table_bytes | n_live_tup | n_dead_tup | bloat
----------+----------+------------+-------------+------------+------------+-------
pgp | edbstore | orders | 819200 | 12000 | 0 | 0
pgp | edbstore | products | 827392 | 10000 | 0 | 0
pgp | edbstore | customers | 3997696 | 20000 | 0 | 0
pgp | edbstore | inventory | 540672 | 10000 | 0 | 0
pgp | edbstore | cust_hist | 2678784 | 60350 | 0 | 0
pgp | edbstore | orderlines | 3153920 | 60350 | 0 | 0
pgp |edbstore | emp | 450560 | 14000 | 0 | 0
(7 rows)
</pre>
<h3>Dashboard
Query</h3>
<h4>SQL</h4>
<pre>
SELECT base.database,
base.schema,
base.table,
base.bloat,
'table_bytes: ' || base.table_bytes || '\n' || 'n_live_tup: ' || base.n_live_tup || '\n' || 'n_dead_tup: ' || base.n_dead_tup AS details
FROM
(SELECT current_catalog AS database,
schemaname AS schema,
relname AS table,
(pg_relation_size(relid)) AS table_bytes,
n_live_tup,
n_dead_tup,
100.00 * (((CAST(n_live_tup AS real) + n_dead_tup)/n_live_tup)-1) AS bloat
FROM pg_stat_user_tables
WHERE n_live_tup>(CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)*10+1)) base;
</pre>
<h4>SQL Example Results</h4>
<pre>
database | schema | table | bloat | details
----------+----------+------------+------------------+-----------------------------------------------------------
pgp | edbstore | orders | 0 | table_bytes: 819200\nn_live_tup: 12000\nn_dead_tup: 0
pgp | edbstore | products | 0 | table_bytes: 827392\nn_live_tup: 10000\nn_dead_tup: 0
pgp | edbstore | customers | 0 | table_bytes: 3997696\nn_live_tup: 20000\nn_dead_tup: 0
pgp | edbstore | inventory | 0 | table_bytes: 450560\nn_live_tup: 10000\nn_dead_tup: 0
pgp | edbstore | cust_hist | 6.30206957287538 | table_bytes: 2678784\nn_live_tup: 56775\nn_dead_tup: 3578
pgp | edbstore | orderlines | 0 | table_bytes: 3153920\nn_live_tup: 60350\nn_dead_tup: 0
(6 rows)
</pre>
<h4>SQL Example After Vacuum</h4>
<pre>
database | schema | table | bloat | details
----------+----------+------------+-------+--------------------------------------------------------
pgp | edbstore | orders | 0 | table_bytes: 819200\nn_live_tup: 12000\nn_dead_tup: 0
pgp | edbstore | products | 0 | table_bytes: 827392\nn_live_tup: 10000\nn_dead_tup: 0
pgp | edbstore | customers | 0 | table_bytes: 3997696\nn_live_tup: 20000\nn_dead_tup: 0
pgp | edbstore | inventory | 0 | table_bytes: 450560\nn_live_tup: 10000\nn_dead_tup: 0
pgp | edbstore | cust_hist | 0 | table_bytes: 2678784\nn_live_tup: 56789\nn_dead_tup: 0
pgp | edbstore | orderlines | 0 | table_bytes: 3153920\nn_live_tup: 60350\nn_dead_tup: 0
(6 rows)
</pre>
<h1>General</h1>
<h2>blocked_transactions</h2>
<h3>Overview</h3>
<p>This monitor reports the number of sessions that are waiting on locks held by other sessions. It gives you a general sense of how much lockup the database is experiencing but is not directly actionable. This would normally be catagorized as "Information Only" but some customers like to receive an email notification when a certain number of blocked sessions is reached.</p>
<h3>Default Frequency/Thresholds</h3>
<ul>
<li><b>Frequency: </b>as needed to support warning/critical logic</li>
<li><b>Warning: </b>as specified by customer at on-boarding</li>
<li><b>Critical: </b>none</li>
</ul>
<h3>Alert Response</h3>
<p><b>Warning</b></p>
<ul>
<li>As specified by customer at on-boarding either do nothing or send an email notifications to customer</li>
</ul>
<p><b>Critical</b></p>
<ul>
<li>None</li>
</ul>
<h3>SQL Scope</h3>
<p>Run once per cluster.</p>
<h3>Base Query</h3>
<h4>SQL Compatibility</h4>
<ul>
<li>In 9.1 pg_stat_activity.pid must be renamed to pg_stat_activity.procpid</li>
<li>In 9.1 pg_stat_activity.state must be removed</li>
</ul>
<h4>SQL</h4>
<pre>
SELECT pg_stat_activity.datname AS database,
pg_stat_activity.pid AS waiting_pid,
pg_stat_activity.usename AS waiting_username,
pg_stat_activity.backend_start AS waiting_login,
left(pg_stat_activity.application_name, 200) AS waiting_application_name,
pg_stat_activity.client_addr AS waiting_client_addr,
left(pg_stat_activity.client_hostname, 200) AS waiting_client_hostname,
pg_stat_activity.client_port AS waiting_client_port,
pg_stat_activity.query_start AS waiting_query_start,
pg_stat_activity.state AS waiting_transaction_state,
now()-pg_stat_activity.query_start AS waiting_query_runtime,
ROUND(EXTRACT(EPOCH
FROM (now())) - EXTRACT(EPOCH
FROM (pg_stat_activity.query_start))) AS waiting_query_runtime_seconds,
ROUND(EXTRACT(EPOCH
FROM (now())) - EXTRACT(EPOCH
FROM (pg_stat_activity.query_start)))/60 AS waiting_query_runtime_minutes
FROM pg_stat_activity
WHERE pg_stat_activity.waiting = TRUE;
</pre>
<h4>SQL Example Results</h4>
<pre>
database | waiting_pid | waiting_username | waiting_login | waiting_application_name | waiting_client_addr | waiting_client_hostname | waiting_client_port | waiting_query_start | waiting_transaction_state | waiting_query_runtime | waiting_query_runtime_seconds | waiting_query_runtime_minutes
----------+-------------+------------------+-------------------------------+--------------------------+---------------------+-------------------------+---------------------+-------------------------------+---------------------------+-----------------------+-------------------------------+-------------------------------
pgp | 8731 | pgp | 2015-06-05 10:03:07.528725-07 | psql | ::1 | | 41154 | 2015-06-09 15:49:12.167473-07 | active | 00:01:58.758353 | 119 | 1.98333333333333
pgp | 9031 | pgp | 2015-06-05 11:06:15.305611-07 | psql | ::1 | | 41158 | 2015-06-09 15:50:41.503815-07 | active | 00:00:29.422011 | 29 | 0.483333333333333
(2 rows)
</pre>
<h4>SQL Example Results Pretty</h4>
<pre>
-[ RECORD 1 ]-----------------+------------------------------
database | pgp
waiting_pid | 8731
waiting_username | pgp
waiting_login | 2015-06-05 10:03:07.528725-07
waiting_application_name | psql
waiting_client_addr | ::1
waiting_client_hostname |
waiting_client_port | 41154
waiting_query_start | 2015-06-09 15:49:12.167473-07
waiting_transaction_state | active
waiting_query_runtime | 00:02:27.438982
waiting_query_runtime_seconds | 147
waiting_query_runtime_minutes | 2.45
-[ RECORD 2 ]-----------------+------------------------------
database | pgp
waiting_pid | 9031
waiting_username | pgp
waiting_login | 2015-06-05 11:06:15.305611-07
waiting_application_name | psql
waiting_client_addr | ::1
waiting_client_hostname |
waiting_client_port | 41158
waiting_query_start | 2015-06-09 15:50:41.503815-07
waiting_transaction_state | active
waiting_query_runtime | 00:00:58.10264
waiting_query_runtime_seconds | 58
waiting_query_runtime_minutes | 0.966666666666667
</pre>
<h3>Dashboard
Query</h3>
<h4>SQL Compatibility</h4>
<ul>
<li>In 9.1 pg_stat_activity.pid must be renamed to pg_stat_activity.procpid</li>
<li>In 9.1 pg_stat_activity.state must be removed</li>
<ul><li>pg_stat_activity.state is not used anyway</li></ul>
</ul>
<h4>SQL</h4>
<pre>
SELECT database,
count_blocked,
average_runtime_minutes,
max_runtime_minutes,
average_runtime_seconds,
max_runtime_seconds,
'count_blocked: ' || count_blocked || '\n' || 'average_runtime_minutes: ' || average_runtime_minutes || '\n' ||
'max_runtime_minutes: ' || max_runtime_minutes || '\n' || 'average_runtime_seconds: ' || average_runtime_seconds || '\n' ||
'max_runtime_seconds: ' || max_runtime_seconds AS details
FROM
(SELECT database,
count(base.waiting_pid) AS count_blocked,
avg(base.waiting_query_runtime_minutes) AS average_runtime_minutes,
max(base.waiting_query_runtime_minutes) AS max_runtime_minutes,
avg(base.waiting_query_runtime_seconds) AS average_runtime_seconds,
max(base.waiting_query_runtime_seconds) AS max_runtime_seconds
FROM
(SELECT pg_stat_activity.datname AS database,
pg_stat_activity.pid AS waiting_pid,
pg_stat_activity.usename AS waiting_username,
pg_stat_activity.backend_start AS waiting_login,
left(pg_stat_activity.application_name, 200) AS waiting_application_name,
pg_stat_activity.client_addr AS waiting_client_addr,
left(pg_stat_activity.client_hostname, 200) AS waiting_client_hostname,
pg_stat_activity.client_port AS waiting_client_port,
pg_stat_activity.query_start AS waiting_query_start,
pg_stat_activity.state AS waiting_transaction_state,
now()-pg_stat_activity.query_start AS waiting_query_runtime,
ROUND(EXTRACT(EPOCH
FROM (now())) - EXTRACT(EPOCH
FROM (pg_stat_activity.query_start))) AS waiting_query_runtime_seconds,
ROUND(EXTRACT(EPOCH
FROM (now())) - EXTRACT(EPOCH
FROM (pg_stat_activity.query_start)))/60 AS waiting_query_runtime_minutes
FROM pg_stat_activity
WHERE pg_stat_activity.waiting = TRUE) base
GROUP BY database) top
UNION ALL
SELECT 'PLACEHOLDER' AS database,
0 AS count_blocked,
0 AS average_runtime_minutes,
0 AS max_runtime_minutes,
0 AS average_runtime_seconds,
0 AS max_runtime_seconds,
NULL AS description;
</pre>
<h4>SQL Example Results</h4>
<pre>
database | count_blocked | average_runtime_minutes | max_runtime_minutes | average_runtime_seconds | max_runtime_seconds | details
-------------+---------------+-------------------------+---------------------+-------------------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------
postgres | 2 | 0.258333333333333 | 0.433333333333333 | 15.5 | 26 | count_blocked: 2\naverage_runtime_minutes: 0.258333333333333\nmax_runtime_minutes: 0.433333333333333\naverage_runtime_seconds: 15.5\nmax_runtime_seconds: 26
PLACEHOLDER | 0 | 0 | 0 | 0 | 0 |
(2 rows)
</pre>
<h2>blocking_idle_in_transaction/idle_in_transaction</h2>
<h3>Overview</h3>
<p>As of now sessions in PostgreSQL can be in one of the following five states (note that in PostgreSQL a "session"/"connection" is frequently referred to as a "backend"): </p>
<ul>
<li><b>active: </b>the backend is executing a query</li>
<li><b>idle: </b>the backend is waiting for a new client command</li>
<li><b>idle in transaction: </b>the backend is in a transaction, but is not currently executing a query</li>
<li><b>idle in transaction (aborted): </b>this state is similar to idle in transaction, except one of the statements in the transaction caused an error</li>
<li><b>fastpath function call: </b>the backend is executing a fast-path function</li>
<li><b>disabled: </b>this state is reported if track_activities is disabled in this backend</li>
<ul><li>Note: this is not an actual state of a transaction. The transaction is actually in one of the other states but they are not being tracked by the view we are querying</li></ul>
</ul>
<p>Aside from using system resources sessions in most of these states do not cause any issue for PostgreSQL, but"idle in transaction" can cause potentially severe problems:</p>
<ul>
<li>The session that is idle in transaction can be holding a lock on a resource needed by another transaction, blocking the other transaction from running to completion</li>
<ul><li>This acts very similar to a deadlock, but the system does not automatically resolve the issue (deadlocks automatically rollback and restart)</li></ul>
<li>In MVCC, rows are not considered "dead" and eligiable to be vacuumed until there is no transaction older than it. So a transaction that has started but left idle prevents all deleted or old versions of updated rows to be retained until that transaction ends. This can cause a considerable amount of bloat in the database which leads to potentially serious performance problems</li>
</ul>
<p>The only way to resolve these issues is for the transaction that is idle to be resumed by the user, committed/rolledback by the user, or forcefully rolled back by a superuser</p>
<h3>blocking_idle_in_transaction</h3>
<h4>Overview</h4>
<p>As described above, transactions that are in state 'idle in transaction' can hold a lock on a resource needed by another transaction, blocking the other transaction from running to completion. This acts very similar to a deadlock, but the system does not automatically resolve the issue (deadlocks automatically rollback and restart). The only way to resolve these issues is for the transaction that is idle to be resumed by the user, committed/rolledback by the user, or forcefully rolled back by a superuser. In this monitor we identify all the sessions that are in state "idle in transaction", measure how long each has been in state "idle in transaction", and how many other transactions each is blocking (0 if they are not blocking other transactions), and the maximum amount of time that it has been blocking another transaction. This gives us the ability to identify when this problem arises either ask the user to resume, commit, or rollback their transaction OR forcefull roll it back. Note that it is also possible for a automated cron job to be run periodically which identifies such transactions (via the below query) and forcefully rolls them back.</p>
<h4>Default Frequency/Thresholds</h4>
<ul>
<li><b>Frequency: </b>as needed to support warning/critical logic</li>
<li><b>Warning: </b>as specified by customer at onboarding, recommend blocking any transaction for longer than 30 minutes</li>
<li>Note: Dashboard
can only support using one parameter as a threshold, so use either the amount of time it has been blocking another transaction (which will be 0 if no transaction is blocked) rather than some combination of time idle in transaction and number of transactions blocked. We could theoretically use a combination of the number of transactions that are being blocked and modify the the frequency of the check, but usually we would want to make sure no transactions are being blocked so being warned if any transaction is blocked</li>
<li><b>Critical: </b>as specified by customer at onboarding</li>
</ul>
<h4>Alert Response</h4>
<p><b>Warning</b></p>
<ul>
<li>As specified by customer at onboarding either notify customer, kill the transaction, or notify customer asking if transaction should be killed</li>