forked from jokergoo/ComplexHeatmap-reference
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-single_heatmap.Rmd
executable file
·1820 lines (1492 loc) · 71.1 KB
/
02-single_heatmap.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# A Single Heatmap {#a-single-heatmap}
A single heatmap is the most used approach for visualizing the data. Although
"the shining point" of the **ComplexHeatmap** package is it can visualize a
list of heatmaps in parallel, as the basic unit of the heatmap list, it is
still very important to have the single heatmap nicely configured.
First let's generate a random matrix where there are three groups by columns
and three groups by rows:
```{r data}
set.seed(123)
nr1 = 4; nr2 = 8; nr3 = 6; nr = nr1 + nr2 + nr3
nc1 = 6; nc2 = 8; nc3 = 10; nc = nc1 + nc2 + nc3
mat = cbind(rbind(matrix(rnorm(nr1*nc1, mean = 1, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc1, mean = 0, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc1, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc2, mean = 0, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc2, mean = 1, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc2, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc3, mean = 0.5, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc3, mean = 0.5, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc3, mean = 1, sd = 0.5), nr = nr3))
)
mat = mat[sample(nr, nr), sample(nc, nc)] # random shuffle rows and columns
rownames(mat) = paste0("row", seq_len(nr))
colnames(mat) = paste0("column", seq_len(nc))
```
Following command contains the minimal argument for the `Heatmap()` function
which just visualizes the matrix as a heatmap with default settings. Very
similar as other heatmap tools, it draws the dendrograms, the row/column names
and the heatmap legend. The default color schema is "blue-white-red" which is
mapped to the minimal-mean-maximal values in the matrix. The title for the
legend is assigned with an internal index number.
```{r default}
Heatmap(mat)
```
The title for the legend is taken from the "name" of the heatmap by default.
Each heatmap has a name which is like a unique identifier for the heatmap and
it is important when you have a list of heatmaps. In later chapters, you will
find the heatmap name is used for setting the "main heatmap" and is used for
decoration of heatmaps. If the name is not assigned, an internal name is
assigned to the heatmap in a form of `matrix_%d`. In following examples in
this chapter, we give the name `mat` to the heatmap (for which you will see
the change of the legend in the next plot).
If you put `Heatmap()` inside a function or a `for`/`if`/`while` chunk, you
won't see the heatmap after executing `Heatmap()`. In this case, you need to
use `draw()` function explicitly as follows. We will explain this point in
more detail in Section \@ref(plot-the-heatmap).
```{r, eval = FALSE}
ht = Heatmap(mat)
draw(ht)
```
## Colors {#colors}
For heatmap visualization, colors are the major representation of the data
matrix. In most cases, the heatmap visualizes a matrix with continuous numeric
values. In this case, users should provide a color mapping function. A color
mapping function should accept a vector of values and return a vector of
corresponding colors. **Users should always use `circlize::colorRamp2()`
function to generate the color mapping function** with using `Heatmap()`. The
two arguments for `colorRamp2()` is a vector of break values and a vector of
corresponding colors. `colorRamp2()` linearly interpolates colors in every
interval through LAB color space. Also using `colorRamp2()` helps to generate
a legend with proper tick marks.
In following example, values between -2 and 2 are linearly interpolated to get
corresponding colors, values larger than 2 are all mapped to red and values
less than -2 are all mapped to green.
```{r}
library(circlize)
col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
col_fun(seq(-3, 3))
Heatmap(mat, name = "mat", col = col_fun)
```
As you can see, the color mapping function exactly maps negative values to
green and positive values to red, even when the distribution of negative
values and positive values are not centric to zero. Also this color mapping
function is not affected by outliers. In following plot, the clustering is
heavily affected by the outlier but not the color mapping.
```{r}
mat2 = mat
mat2[1, 1] = 100000
Heatmap(mat2, name = "mat", col = col_fun)
```
More importantly, `colorRamp2()` makes colors in multiple heatmaps comparible
if they are set with a same color mapping function. In following three
heatmaps, a same color always corresponds to a same value.
```{r, eval = FALSE}
Heatmap(mat, name = "mat", col = col_fun, column_title = "mat")
Heatmap(mat/4, name = "mat", col = col_fun, column_title = "mat/4")
Heatmap(abs(mat), name = "mat", col = col_fun, column_title = "abs(mat)")
```
```{r, echo = FALSE, fig.width = 10, fig.height = 10/3}
pushViewport(viewport(x = 0, width = 1/3, just = "left"))
draw(Heatmap(mat, name = "mat", col = col_fun, column_title = "mat"), newpage = FALSE)
popViewport()
pushViewport(viewport(x = 1/3, width = 1/3, just = "left"))
draw(Heatmap(mat/4, name = "mat", col = col_fun, column_title = "mat/4"), newpage = FALSE)
popViewport()
pushViewport(viewport(x = 2/3, width = 1/3, just = "left"))
draw(Heatmap(abs(mat), name = "mat", col = col_fun, column_title = "abs(mat)"), newpage = FALSE)
popViewport()
```
If the matrix is continuous, you can also simply provide a vector of colors
and colors will be linearly interpolated. But remember this method is not
robust to outliers because the mapping starts from the minimal value in the
matrix and ends with the maximal value. Following color mapping setting is
identical to `colorRamp2(seq(min(mat), max(mat), length = 10), rev(rainbow(10)))`.
```{r}
Heatmap(mat, name = "mat", col = rev(rainbow(10)))
```
If the matrix contains discrete values (either numeric or character), colors
should be specified as a named vector to make it possible for the mapping from
discrete values to colors. If there is no name for the color, the order of
colors corresponds to the order of `unique(mat)`. Note now the legend is
generated from the color mapping vector.
Following sets colors for a discrete numeric matrix (you don't need to convert
it to a character matrix).
```{r}
discrete_mat = matrix(sample(1:4, 100, replace = TRUE), 10, 10)
colors = structure(1:4, names = c("1", "2", "3", "4")) # black, red, green, blue
Heatmap(discrete_mat, name = "mat", col = colors)
```
Or a character matrix:
```{r discrete_character_matrix}
discrete_mat = matrix(sample(letters[1:4], 100, replace = TRUE), 10, 10)
colors = structure(1:4, names = letters[1:4])
Heatmap(discrete_mat, name = "mat", col = colors)
```
As you see in the two examples above, for the numeric matrix (no matter the
color is continuous mapping or discrete mapping), by default clustering is
applied on both dimensions while for character matrix, clustering is turned
off (but you can still cluster a character matrix if you provide a proper
distance metric for two character vectors, see example in Section
\@ref(distance-methods)).
``NA`` is allowed in the matrix. You can control the color of `NA` by `na_col`
argument (by default it is grey for `NA`). **The matrix that contains `NA` can
be clustered by `Heatmap()`.**
Note the `NA` value is not presented in the legend.
```{r}
mat_with_na = mat
na_index = sample(c(TRUE, FALSE), nrow(mat)*ncol(mat), replace = TRUE, prob = c(1, 9))
mat_with_na[na_index] = NA
Heatmap(mat_with_na, name = "mat", na_col = "black")
```
Color space is important for interpolating colors. By default, colors are
linearly interpolated in [LAB color
space](https://en.wikipedia.org/wiki/Lab_color_space), but you can select the
color space in `colorRamp2()` function. Compare following two plots. Can you
see the difference?
```{r, eval = FALSE}
f1 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"))
f2 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"),
space = "RGB")
Heatmap(mat, name = "mat1", col = f1, column_title = "LAB color space")
Heatmap(mat, name = "mat2", col = f2, column_title = "RGB color space")
```
```{r, fig.width = 10, echo = FALSE}
f1 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"))
f2 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"),
space = "RGB")
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat1", col = f1, column_title = "LAB color space"), newpage = FALSE)
upViewport()
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat2", col = f2, column_title = "RGB color space"), newpage = FALSE)
upViewport()
```
In following plots, corresponding values change evenly on the folded lines,
you can see how colors change under different color spaces (top plots:
green-black-red, bottom plots: blue-white-red. The plot is made by
[**HilbertCurve**package](https://bioconductor.org/packages/release/bioc/html/HilbertCurve.html)).
```{r, fig.width = 14, fig.height = 14/5, echo = FALSE, message = FALSE}
space = c("RGB", "LAB", "XYZ", "sRGB", "LUV")
pushViewport(viewport(layout = grid.layout(nr = 1, nc = length(space))))
for(i in seq_along(space)) {
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = i))
hc = HilbertCurve::HilbertCurve(1, 100, level = 4, newpage = FALSE, title = space[i])
ir = IRanges::IRanges(start = 1:99, end = 2:100)
f = colorRamp2(c(-1, 0, 1), c("green", "black", "red"), space = space[i])
col = f(seq(-1, 1, length = 100))
HilbertCurve::hc_points(hc, ir, np = 3, gp = gpar(col = col, fill = col))
upViewport()
}
upViewport()
grid.newpage()
pushViewport(viewport(layout = grid.layout(nr = 1, nc = length(space))))
for(i in seq_along(space)) {
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = i))
hc = HilbertCurve::HilbertCurve(1, 100, level = 4, newpage = FALSE, title = space[i])
ir = IRanges::IRanges(start = 1:99, end = 2:100)
f = colorRamp2(c(-1, 0, 1), c("blue", "white", "red"), space = space[i])
col = f(seq(-1, 1, length = 100))
HilbertCurve::hc_points(hc, ir, np = 3, gp = gpar(col = col, fill = col))
upViewport()
}
upViewport()
```
Last but not the least, colors for the heatmap borders can be set by the
`border` and `rect_gp` arguments. `border` controls the global border of the
heatmap body and `rect_gp` controls the border of the grids in the heatmap.
The value of `border` can be logical (`TRUE` corresponds to `black`) or a
character of color (e.g. `red`).
`rect_gp` is a `gpar` object which means you can only set it by
`grid::gpar()`. Since the filled color is already controlled by the heatmap
color mapping, you can only set the `col` parameter in `gpar()` to control the
border of the heatmap grids.
```{r}
Heatmap(mat, name = "mat", border = TRUE)
Heatmap(mat, name = "mat", rect_gp = gpar(col = "white", lwd = 2))
```
If `col` is not set, the default color mapping by `Heatmap()` is designed with
trying to be as convinient and meaningful as possible. Following are the rules
for the default color mapping (by `ComplexHeatmap:::default_col()`):
- If the values are characters, the colors are generated by
`circlize::rand_color()`;
- If the values are from the heatmap annotation and are numeric, colors are
mapped between white and one random color by linearly interpolating to the
mininum and maxinum.
- If the values are from the matrix (let's denote it as $M$) which corresponds
to the heatmap body:
* If the fraction of positive values in $M$ is between 25% and 75%, colors
are mapped to blue, white and red by linearly interpolating to $-q$, 0
and $q$, where $q$ is the maximum of $|M|$ if the number of unique
values is less than 100, or $q$ is the 99^th percentile of $|M|$. This
color mapping is centric to zero.
* Or else the colors are mapped to blue, white and red by linearly
interpolating to $q_1$, $(q_1 + q_2)/2$ and $q_2$, where $q_1$ and $q_2$
are mininum and maxinum if the number of unique values is $M$ is less
than 100, or $q1$ is the 1^th percentile and $q2$ is the 99^th
percentile in $M$.
`rect_gp` allows a non-standard parameter `type`. If it is set to `"none"`,
the clustering is still applied but nothing in drawn on the heatmap body. The
customized graphics on heatmap body can be added via a self-defined `cell_fun`
or `layer_fun` (see Section \@ref(customize-the-heatmap-body)).
```{r}
Heatmap(mat, name = "mat", rect_gp = gpar(type = "none"))
```
## Titles {#heatmap-titles}
The title of the heatmap basically tells what the plot is about. In
**ComplexHeatmap** package, you can set heatmap title either by the row or/and
by the column. Note at a same time you can only put e.g. column title either
on the top or at the bottom of the heatmap.
The graphic parameters can be set by `row_title_gp` and `column_title_gp`
respectively. Please remember you should use `gpar()` to specify graphic
parameters.
```{r row_column_title}
Heatmap(mat, name = "mat", column_title = "I am a column title",
row_title = "I am a row title")
Heatmap(mat, name = "mat", column_title = "I am a column title at the bottom",
column_title_side = "bottom")
Heatmap(mat, name = "mat", column_title = "I am a big column title",
column_title_gp = gpar(fontsize = 20, fontface = "bold"))
```
Rotations for titles can be set by `row_title_rot` and `column_title_rot`, but
only horizontal and vertical rotations are allowed.
```{r title_rotation}
Heatmap(mat, name = "mat", row_title = "row title", row_title_rot = 0)
```
Row or column title supports as a template which is used when rows or columns
are split in the heatmap (because there will be multiple row/column titles).
This functionality is introduced in Section \@ref(heatmap-split). A quick
example would be:
```{r, eval = FALSE}
# code only for demonstration
# row title would be cluster_1 and cluster_2
Heatmap(mat, name = "mat", row_km = 2, row_title = "cluster_%s")
```
You can set `fill` parameter in `row_title_gp` and `column_title_gp` to set
the background color of titles. Since `col` in e.g. `row_title_gp` controls the
color of text, `border` is used to control the color of the background border.
```{r}
Heatmap(mat, name = "mat", column_title = "I am a column title",
column_title_gp = gpar(fill = "red", col = "white", border = "red"))
```
If the graphic elements are texts, they can be set as mathematical formulas.
```{r}
Heatmap(mat, name = "mat",
column_title = expression(hat(beta) == (X^t * X)^{-1} * X^t * y))
```
## Clustering {#clustering}
Clustering might be the key component of the heatmap visualization. In
**ComplexHeatmap** package, hierarchical clustering is supported with great
flexibility. You can specify the clustering either by:
- a pre-defined distance method (e.g. `"euclidean"` or `"pearson"`),
- a distance function,
- a object that already contains clustering (a `hclust` or `dendrogram` object
or object that can be coerced to `dendrogram` class),
- a clustering function.
It is also possible to render the dendrograms with different colors and styles
for different branches for better revealing structures of the dendrogram (e.g.
by `dendextend::color_branches()`).
First, there are general settings for the clustering, e.g. whether apply
clustering or show dendrograms, the side of the dendrograms and heights of the
dendrograms.
```{r cluster_basic}
Heatmap(mat, name = "mat", cluster_rows = FALSE) # turn off row clustering
Heatmap(mat, name = "mat", show_column_dend = FALSE) # hide column dendrogram
Heatmap(mat, name = "mat", row_dend_side = "right", column_dend_side = "bottom")
Heatmap(mat, name = "mat", column_dend_height = unit(4, "cm"),
row_dend_width = unit(4, "cm"))
```
### Distance methods {#distance-methods}
Hierarchical clustering is performed in two steps: calculate the distance
matrix and apply clustering. There are three ways to specify distance metric
for clustering:
- specify distance as a pre-defined option. The valid values are the supported
methods in `dist()` function and in `"pearson"`, `"spearman"` and
`"kendall"`. The correlation distance is defined as `1 - cor(x, y, method)`.
All these built-in distance methods allow `NA` values.
- a self-defined function which calculates distance from a matrix. The
function should only contain one argument. Please note for clustering on
columns, the matrix will be transposed automatically.
- a self-defined function which calculates distance from two vectors. The
function should only contain two arguments. Note this might be slow because
it is implemented by two nested `for` loop.
```{r cluster_distance}
Heatmap(mat, name = "mat", clustering_distance_rows = "pearson",
column_title = "pre-defined distance method (1 - pearson)")
Heatmap(mat, name = "mat", clustering_distance_rows = function(m) dist(m),
column_title = "a function that calculates distance matrix")
Heatmap(mat, name = "mat", clustering_distance_rows = function(x, y) 1 - cor(x, y),
column_title = "a function that calculates pairwise distance")
```
Based on these features, we can apply clustering which is robust to outliers
based on the pairwise distance. Note here we set the color mapping function
because we don't want outliers affect the colors.
```{r cluster_distance_advanced, eval = 1:10}
mat_with_outliers = mat
for(i in 1:10) mat_with_outliers[i, i] = 1000
robust_dist = function(x, y) {
qx = quantile(x, c(0.1, 0.9))
qy = quantile(y, c(0.1, 0.9))
l = x > qx[1] & x < qx[2] & y > qy[1] & y < qy[2]
x = x[l]
y = y[l]
sqrt(sum((x - y)^2))
}
```
We can compare the two heatmaps with or without the robust distance method:
```{r, eval = FALSE}
Heatmap(mat_with_outliers, name = "mat",
col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
column_title = "dist")
Heatmap(mat_with_outliers, name = "mat",
col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
clustering_distance_rows = robust_dist,
clustering_distance_columns = robust_dist,
column_title = "robust_dist")
```
```{r, echo = FALSE, fig.width = 10, fig.height = 5}
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
ht1 = Heatmap(mat_with_outliers, name = "mat",
col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
column_title = "dist")
draw(ht1, newpage = FALSE)
popViewport()
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
ht2 = Heatmap(mat_with_outliers, name = "mat",
col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
clustering_distance_rows = robust_dist,
clustering_distance_columns = robust_dist,
column_title = "robust_dist")
draw(ht2, newpage = FALSE)
popViewport()
```
If there are proper distance methods (like methods in [**stringdist**
package](https://cran.r-project.org/web/packages/stringdist/)), you can also
cluster a character matrix. `cell_fun` argument will be introduced in Section
\@ref(customize-the-heatmap-body).
```{r cluster_character_matrix}
mat_letters = matrix(sample(letters[1:4], 100, replace = TRUE), 10)
# distance in the ASCII table
dist_letters = function(x, y) {
x = strtoi(charToRaw(paste(x, collapse = "")), base = 16)
y = strtoi(charToRaw(paste(y, collapse = "")), base = 16)
sqrt(sum((x - y)^2))
}
Heatmap(mat_letters, name = "letters", col = structure(2:5, names = letters[1:4]),
clustering_distance_rows = dist_letters, clustering_distance_columns = dist_letters,
cell_fun = function(j, i, x, y, w, h, col) { # add text to each grid
grid.text(mat_letters[i, j], x, y)
})
```
### Clustering methods {#clustering-methods}
Method to perform hierarchical clustering can be specified by
`clustering_method_rows` and `clustering_method_columns`. Possible methods are
those supported in `hclust()` function.
```{r cluster_method}
Heatmap(mat, name = "mat", clustering_method_rows = "single")
```
If you already have a clustering object or a function which directly returns a
clustering object, you can ignore the distance settings and set `cluster_rows`
or `cluster_columns` to the clustering objects or clustering functions. If it
is a clustering function, the only argument should be the matrix and it should
return a `hclust` or `dendrogram` object or a object that can be coerced to
the `dendrogram` class.
In following example, we perform clustering with methods from **cluster**
package either by a pre-calculated clustering object or a clustering function:
```{r cluster_object}
library(cluster)
Heatmap(mat, name = "mat", cluster_rows = diana(mat),
cluster_columns = agnes(t(mat)), column_title = "clustering objects")
# if cluster_columns is set as a function, you don't need to transpose the matrix
Heatmap(mat, name = "mat", cluster_rows = diana,
cluster_columns = agnes, column_title = "clustering functions")
```
The last command is as same as :
```{r, eval = FALSE}
# code only for demonstration
Heatmap(mat, name = "mat", cluster_rows = function(m) as.dendrogram(diana(m)),
cluster_columns = function(m) as.dendrogram(agnes(m)),
column_title = "clutering functions")
```
Please note, when `cluster_rows` is set as a function, the argument `m` is the
input `mat` itself, while for `cluster_columns`, `m` is the transpose of
`mat`.
`fastcluster::hclust` implements a faster version of `hclust()`. You can set
it to `cluster_rows` and `cluster_columns` to use the faster version of
`hclust()`.
```{r, eval = FALSE}
# code only for demonstration
fh = function(x) fastcluster::hclust(dist(x))
Heatmap(mat, name = "mat", cluster_rows = fh, cluster_columns = fh)
```
To make it more convinient to use the faster version of `hclust()` (assuming
you have many heatmaps to construct), it can be set as a global option. The
usage of `ht_opt` is introduced in Section
\@ref(change-parameters-globally).
```{r, eval = FALSE}
# code only for demonstration
ht_opt$fast_hclust = TRUE
# now fastcluster::hclust is used in all heatmaps
```
This is one specific scenario that you might already have a subgroup
classification for the matrix rows or columns, and you only want to perform
clustering for the features in the same subgroup. There is one way that you
can split the heatmap by the subgroup variable (see Section
\@ref(heatmap-split)), or you can use `cluster_within_group()` clustering
function to generate a special dendrogram.
```{r}
group = kmeans(t(mat), centers = 3)$cluster
Heatmap(mat, name = "mat", cluster_columns = cluster_within_group(mat, group))
```
In above example, columns in a same group are still clustered, but the
dendrogram is degenerated as a flat line. The dendrogram on columns shows the
hierarchy of the groups.
### Render dendrograms {#render-dendrograms}
If you want to render the dendrogram, normally you need to generate a
`dendrogram` object and render it in the first place, then send it to the
`cluster_rows` or `cluster_columns` argument.
You can render your `dendrogram` object by the **dendextend** package to make
a more customized visualization of the dendrogram. Note **ComplexHeatmap**
only allows rendering on the dendrogram lines.
```{r cluster_dendextend}
library(dendextend)
row_dend = as.dendrogram(hclust(dist(mat)))
row_dend = color_branches(row_dend, k = 2) # `color_branches()` returns a dendrogram object
Heatmap(mat, name = "mat", cluster_rows = row_dend)
```
`row_dend_gp` and `column_dend_gp` control the global graphic setting for
dendrograms. Note e.g. graphic settings in `row_dend` will be overwritten by
`row_dend_gp`.
```{r}
Heatmap(mat, name = "mat", cluster_rows = row_dend, row_dend_gp = gpar(col = "red"))
```
### Reorder dendrograms {#reorder-dendrograms}
In the `Heatmap()` function, dendrograms are reordered to make features with
larger difference more separated from each others (please refer to the
documentation of `reorder.dendrogram()`). Here the difference (or it is called
the weight) is measured by the row means if it is a row dendrogram or by the
column means if it is a column dendrogram. `row_dend_reorder` and
`column_dend_reorder` control whether to apply dendrogram reordering if the
value is set as logical. The two arguments also control the weight for the
reordering if they are set to numeric vectors (it will be sent to the `wts`
argument of `reorder.dendrogram()`). The reordering can be turned off by
setting e.g. `row_dend_reorder = FALSE`.
By default, dendrogram reordering is turned on if
`cluster_rows`/`cluster_columns` is set as logical value or a clustering
function. It is turned off if `cluster_rows`/`cluster_columns` is set as
clustering object.
Compare following two heatmaps:
```{r, eval = FALSE}
m2 = matrix(1:100, nr = 10, byrow = TRUE)
Heatmap(m2, name = "mat", row_dend_reorder = FALSE, column_title = "no reordering")
Heatmap(m2, name = "mat", row_dend_reorder = TRUE, column_title = "apply reordering")
```
```{r cluster_dendsort, fig.width = 10, fig.height = 5, echo = FALSE}
m2 = matrix(1:100, nr = 10, byrow = TRUE)
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
draw(Heatmap(m2, name = "mat", row_dend_reorder = FALSE, column_title = "no reordering"),
newpage = FALSE)
upViewport()
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
draw(Heatmap(m2, name = "mat", row_dend_reorder = TRUE, column_title = "applied reordering"),
newpage = FALSE)
upViewport()
```
There are many other methods for reordering dendrograms, e.g. the **dendsort**
package. Basically, all these methods still return a dendrogram that has been
reordered, thus, we can firstly generate the row or column dendrogram based on
the data matrix, reorder it by some method, and assign it back to
`cluster_rows` or `cluster_columns`.
Compare following two reorderings. Can you tell which is better?
```{r, eval = FALSE}
Heatmap(mat, name = "mat", column_title = "default reordering")
library(dendsort)
dend = dendsort(hclust(dist(mat)))
Heatmap(mat, name = "mat", cluster_rows = dend, column_title = "reorder by dendsort")
```
```{r, echo = FALSE, fig.width = 10, fig.height = 5}
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat", column_title = "default reordering"), newpage = FALSE)
popViewport()
library(dendsort)
dend = dendsort(hclust(dist(mat)))
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat", cluster_rows = dend,
column_title = "reordering by dendsort"), newpage = FALSE)
popViewport()
```
## Set row and column orders {#row-and_column_orders}
Clustering is used to adjust row orders and column orders of the heatmap, but
you can still set the order manually by `row_order` and `column_order`. If
e.g. `row_order` is set, row clustering is turned off by default.
```{r manual_order}
Heatmap(mat, name = "mat", row_order = order(as.numeric(gsub("row", "", rownames(mat)))),
column_order = order(as.numeric(gsub("column", "", colnames(mat)))))
```
The orders can be character vectors if they are just shuffles of the matrix row names or column names.
```{r}
Heatmap(mat, name = "mat", row_order = sort(rownames(mat)),
column_order = sort(colnames(mat)))
```
Note `row_dend_reorder` and `row_order` are two different things.
`row_dend_reorder` is applied on the dendrogram. For any node in the
dendrogram, rotating its two branches actually gives an identical dendrogram,
thus, reordering the dendrogram by automatically rotating sub-dendrogram at
every node can help to separate elements further from each other which show
more difference. As a comparison, `row_order` is simply applied on the matrix
and normally dendrograms should be turned off.
## Seriation {#heatmap-seriation}
Seriation is an interesting technique for ordering the matrix (see this
interesting post: http://nicolas.kruchten.com/content/2018/02/seriation/). The
powerful [**seriation**
package](https://cran.r-project.org/web/packages/seriation/index.html)
implements quite a lot of methods for seriation. Since it is easy to extract
row orders and column orders from the object returned by the core function
`seriate()` from **seriation** package. They can be directly assigned to
`row_order` and `column_order` to make the heatmap.
The first example demonstrates to directly apply `seriate()` on the matrix.
Since the `"BEA_TSP"` method only allows a non-negative matrix, we modify the
matrix to `max(mat) - mat`.
```{r}
library(seriation)
o = seriate(max(mat) - mat, method = "BEA_TSP")
Heatmap(max(mat) - mat, name = "mat",
row_order = get_order(o, 1), column_order = get_order(o, 2))
```
Or you can apply `seriate()` to the distance matrix. Now the order for rows
and columns needs to be calcualted separatedly because the distance matrix
needs to be calculated separatedly for columns and rows.
```{r}
o1 = seriate(dist(mat), method = "TSP")
o2 = seriate(dist(t(mat)), method = "TSP")
Heatmap(mat, name = "mat", row_order = get_order(o1), column_order = get_order(o2))
```
Some seriation methods also contain the hierarchical clustering information.
Let's try:
```{r}
o1 = seriate(dist(mat), method = "GW")
o2 = seriate(dist(t(mat)), method = "GW")
```
`o1` and `o2` are actually mainly composed of `hclust` objects:
```{r}
class(o1[[1]])
```
And the orders are the same by using `hclust$order` or `get_order()`.
```{r}
o1[[1]]$order
# should be the same as the previous one
get_order(o1)
```
And we can add the dendrograms to the heatmap.
```{r}
Heatmap(mat, name = "mat", cluster_rows = as.dendrogram(o1[[1]]),
cluster_columns = as.dendrogram(o2[[1]]))
```
For more use of the `seriate()` function, please refer to the [**seriation**
package](https://cran.r-project.org/web/packages/seriation/index.html).
## Dimension names {#dimension-names}
The row names and column names are drawn on the right and bottom sides of the
heatmap by default. Side, visibility and graphic parameters for dimension
names can be set as follows:
```{r dimension_name}
Heatmap(mat, name = "mat", row_names_side = "left", row_dend_side = "right",
column_names_side = "top", column_dend_side = "bottom")
Heatmap(mat, name = "mat", show_row_names = FALSE)
Heatmap(mat, name = "mat", row_names_gp = gpar(fontsize = 20))
Heatmap(mat, name = "mat", row_names_gp = gpar(col = c(rep("red", 10), rep("blue", 8))))
Heatmap(mat, name = "mat", row_names_centered = TRUE, column_names_centered = TRUE)
```
The rotation of column names can be set by `column_names_rot`:
```{r, eval = FALSE}
Heatmap(mat, name = "mat", column_names_rot = 45)
Heatmap(mat, name = "mat", column_names_rot = 45, column_names_side = "top",
column_dend_side = "bottom")
```
```{r, echo = FALSE, fig.width = 10, fig.height = 5}
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat", column_names_rot = 45), newpage = FALSE)
upViewport()
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
draw(Heatmap(mat, name = "mat", column_names_rot = 45, column_names_side = "top",
column_dend_side = "bottom"), newpage = FALSE)
upViewport()
```
If you have row names or column names which are too long,
`row_names_max_width` or `column_names_max_height` can be used to set the
maximal space for them. The default maximal space for row names and column
names are all 6 cm. In following code, `max_text_width()` is a helper function
to quick calculate maximal width from a vector of text.
```{r, eval = FALSE}
mat2 = mat
rownames(mat2)[1] = paste(c(letters, LETTERS), collapse = "")
Heatmap(mat2, name = "mat")
Heatmap(mat2, name = "mat",
row_names_max_width = max_text_width(
rownames(mat2),
gp = gpar(fontsize = 12)
))
```
```{r, echo = FALSE, fig.width = 7, fig.height = 8}
mat2 = mat
rownames(mat2)[1] = paste(c(letters, LETTERS), collapse = "")
pushViewport(viewport(y = 1, height = 0.5, just = "top"))
draw(Heatmap(mat2, name = "mat", row_title = "default row_names_max_width"), newpage = FALSE)
upViewport()
pushViewport(viewport(y = 0.5, height = 0.5, just = "top"))
draw(Heatmap(mat2, name = "mat", row_title = "row_names_max_width as length of a*",
row_names_max_width = max_text_width(rownames(mat2), gp = gpar(fontsize = 12))), newpage = FALSE)
upViewport()
```
Instead of directly using the row/column names from the matrix, you can also
provide another character vector which corresponds to the rows or columns and
set it by `row_labels` or `column_labels`. This is useful because you don't
need to change the dimension names of the matrix to change the labels on the
heatmap while you can directly provide the new labels.
There is one typical scenario that `row_labels` and `column_labels` are
useful. For the gene expression analysis, we might use Ensembl ID as the gene
ID which is used as row names of the gene expression matrix. However, the
Ensembl ID is for the indexing of the Ensembl database but not for the human
reading. Instead, we would prefer to put gene symbols on the heatmap as the
row names which is easier to read. To do this, we only need to assign the
corresponding gene symbols to `row_labels` without modifying the original
matrix.
The second advantage is `row_labels` or `column_labels` allows duplicated
labels, while duplicated row names or column names are not allowed in the
matrix.
Following gives a simple example that we put letters as row labels and column
labels:
```{r}
# use a named vector to make sure the correspondance between
# row names and row labels is correct
row_labels = structure(paste0(letters[1:24], 1:24), names = paste0("row", 1:24))
column_labels = structure(paste0(LETTERS[1:24], 1:24), names = paste0("column", 1:24))
row_labels
Heatmap(mat, name = "mat", row_labels = row_labels[rownames(mat)],
column_labels = column_labels[colnames(mat)])
```
The third advantage is mathematical expression can be used as row names in the
heatmap.
```{r}
Heatmap(mat, name = "mat", row_labels = expression(alpha, beta, gamma, delta, epsilon,
zeta, eta, theta, iota, kappa, lambda, mu, nu, xi, omicron, pi, rho, sigma))
```
`anno_text()` (Section \@ref(text-annotation)) can be used to add more customized
labels for heatmap rows and columns.
## Heatmap split {#heatmap-split}
One major advantage of **ComplexHeatmap** package is it supports splitting the
heatmap by rows and columns to better group the features and additionally
highlight the patterns.
Following arguments control the splitting: `row_km`, `row_split`, `column_km`,
`column_split`. In following, we call the sub-clusters generated by splitting
"_slices_".
### Split by k-means clustering {#split-by-kmeans-clustering}
`row_km` and `column_km` apply k-means partitioning.
```{r k_means}
Heatmap(mat, name = "mat", row_km = 2)
Heatmap(mat, name = "mat", column_km = 3)
```
Row splitting and column splitting can be performed simultaneously.
```{r}
Heatmap(mat, name = "mat", row_km = 2, column_km = 3)
```
You might notice there are dashed lines in the row and column dendrograms,
it will be explained in Section \@ref(split-by-categorical-variables) (last paragraph).
`Heatmap()` internally calls `kmeans()` with random start points, which
results in, for some cases, generating different clusters from repeated runs.
To get rid of this problem, `row_km_repeats` and `column_km_repeats` can be
set to a number larger than 1 to run `kmeans()` multiple times and a final
consensus k-means clustering is used. Please note the final number of clusters
form consensus k-means might be smaller than the number set in `row_km` and
`column_km`.
```{r, eval = FALSE}
Heatmap(mat, name = "mat",
row_km = 2, row_km_repeats = 100,
column_km = 3, column_km_repeats = 100)
```
### Split by categorical variables {#split-by-categorical-variables}
More generally, `row_split` or `column_split` can be set to a categorical
vector or a data frame where different combinations of levels split the
rows/columns in the heatmap. How to control the order of the slices is
introduced in Section \@ref(order-of-slices).
```{r split}
# split by a vector
Heatmap(mat, name = "mat",
row_split = rep(c("A", "B"), 9), column_split = rep(c("C", "D"), 12))
# split by a data frame
Heatmap(mat, name = "mat",
row_split = data.frame(rep(c("A", "B"), 9), rep(c("C", "D"), each = 9)))
# split on both dimensions
Heatmap(mat, name = "mat", row_split = factor(rep(c("A", "B"), 9)),
column_split = factor(rep(c("C", "D"), 12)))
```
Actually, k-means clustering just generates a vector of cluster classes and
appends to `row_split` or `column_split`. `row_km`/`column_km` and be used
mixed with `row_split` and `column_split`.
```{r}
Heatmap(mat, name = "mat", row_split = rep(c("A", "B"), 9), row_km = 2)
```
which is the same as:
```{r, eval = FALSE}
# code only for demonstration
cl = kmeans(mat, centers = 2)$cluster
# classes from k-means are always put as the first column in `row_split`
Heatmap(mat, name = "mat", row_split = cbind(cl, rep(c("A", "B"), 9)))
```
If you are not happy with the default k-means partition, it is easy to use
other partition methods by just assigning the partition vector to
`row_split`/`column_split`.
```{r pam}
pa = cluster::pam(mat, k = 3)
Heatmap(mat, name = "mat", row_split = paste0("pam", pa$clustering))
```
If `row_order` or `column_order` is set, in each row/column slice, it is still
ordered.
```{r split_row_order}
# remember when `row_order` is set, row clustering is turned off
Heatmap(mat, name = "mat", row_order = 18:1, row_km = 2)
```
Character matrix can only be split by `row_split`/`column_split` argument.
```{r split_discrete_matrix}
# split by the first column in `discrete_mat`
Heatmap(discrete_mat, name = "mat", col = 1:4, row_split = discrete_mat[, 1])
```
If `row_km`/`column_km` is set or `row_split`/`column_split` is set as a
vector or a data frame, hierarchical clustering is first applied to each slice
(of course, clustering should be turned on) which generates `k` dendrograms,
then a parent dendrogram is generated based on the mean values of each slice.
**The height of the parent dendrogram is adjusted by adding the maximal height
of the dendrograms in all children slices and the parent dendrogram is added
on top of the children dendrograms to form a single global dendrogram.** This
is why you see dashed lines in the dendrograms in previous heatmaps. They are
used to mark the parent dendrogram and the children dendrograms, and alert
users they are calculated in different ways. These dashed lines can be removed
by setting `show_parent_dend_line = FALSE` in `Heatmap()`, or set it as a
global option: `ht_opt$show_parent_dend_line = FALSE`.
```{r}
Heatmap(mat, name = "mat", row_km = 2, column_km = 3, show_parent_dend_line = FALSE)
```
### Split by dendrogram {#spilt-by-dendrogram}
A second scenario for splitting is that users may still want to keep the
global dendrogram **which is generated from the complete matrix** while not
split it in the first place. In this case, `row_split`/`column_split` can be
set to a single number which will apply `cutree()` on the row/column
dendrogram. This works when `cluster_rows`/`cluster_columns` is set to `TRUE`
or is assigned with a `hclust`/`dendrogram` object.
For this case, the dendrogram is still as same as the original one, expect the
positions of dendrogram leaves are slightly adjusted by the gaps between
slices. (There is no dashed lines, because here the dendrogram is calcualted
as a complete one and there is no parent dendrogram or children dendrograms.)
```{r split_dendrogram}
Heatmap(mat, name = "mat", row_split = 2, column_split = 3)
dend = hclust(dist(mat))
dend = color_branches(dend, k = 2)
Heatmap(mat, name = "mat", cluster_rows = dend, row_split = 2)
```
If you want to combine splitting from `cutree()` and other categorical
variables, you need to generate the classes from `cutree()` in the first
place, append to e.g. `row_split` as a data frame and then send it to
`row_split` argument.
```{r, eval = FALSE}
# code only for demonstration
split = data.frame(cutree(hclust(dist(mat)), k = 2), rep(c("A", "B"), 9))
Heatmap(mat, name = "mat", row_split = split)
```
### Order of slices {#order-of-slices}
When `row_split`/`column_split` is set as categorical variable (a vector or a
data frame) or `row_km`/`column_km` is set, by default, there is an additional
clustering applied to the mean of slices to show the hierarchy in the slice
level. Under this scenario, you cannot precisely control the order of slices
because it is controlled by the clustering of slices.
Nevertheless, you can set `cluster_row_slices` or `cluster_column_slices` to
`FALSE` to turn off the clustering on slices, and now you can precisely
control the order of slices.
When there is no slice clustering, the order of each slice can be controlled
by `levels` of each variable in `row_split`/`column_split` (in this case, each
variable should be a factor). If all variables are characters, the default
order is `unique(row_split)` or `unique(column_split)`. Compare following
heatmaps:
```{r}
Heatmap(mat, name = "mat",
row_split = rep(LETTERS[1:3], 6),
column_split = rep(letters[1:6], 4))
# clustering is similar as previous heatmap with branches in some nodes in the dendrogram flipped
Heatmap(mat, name = "mat",
row_split = factor(rep(LETTERS[1:3], 6), levels = LETTERS[3:1]),
column_split = factor(rep(letters[1:6], 4), levels = letters[6:1]))