-
Notifications
You must be signed in to change notification settings - Fork 2
/
introduction.html
205 lines (179 loc) · 19.2 KB
/
introduction.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="Li Wang" />
<meta name="date" content="2019-04-15" />
<title>Introduction to R package DeClust</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200%2010px%2010px%3B%0Apadding%3A%204px%3B%0Awidth%3A%20400px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Aborder%2Dradius%3A%205px%3B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Afont%2Dsize%3A%2013px%3B%0Aline%2Dheight%3A%201%2E3%3B%0A%7D%0A%23TOC%20%2Etoctitle%20%7B%0Afont%2Dweight%3A%20bold%3B%0Afont%2Dsize%3A%2015px%3B%0Amargin%2Dleft%3A%205px%3B%0A%7D%0A%23TOC%20ul%20%7B%0Apadding%2Dleft%3A%2040px%3B%0Amargin%2Dleft%3A%20%2D1%2E5em%3B%0Amargin%2Dtop%3A%205px%3B%0Amargin%2Dbottom%3A%205px%3B%0A%7D%0A%23TOC%20ul%20ul%20%7B%0Amargin%2Dleft%3A%20%2D2em%3B%0A%7D%0A%23TOC%20li%20%7B%0Aline%2Dheight%3A%2016px%3B%0A%7D%0Atable%20%7B%0Amargin%3A%201em%20auto%3B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dcolor%3A%20%23DDDDDD%3B%0Aborder%2Dstyle%3A%20outset%3B%0Aborder%2Dcollapse%3A%20collapse%3B%0A%7D%0Atable%20th%20%7B%0Aborder%2Dwidth%3A%202px%3B%0Apadding%3A%205px%3B%0Aborder%2Dstyle%3A%20inset%3B%0A%7D%0Atable%20td%20%7B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dstyle%3A%20inset%3B%0Aline%2Dheight%3A%2018px%3B%0Apadding%3A%205px%205px%3B%0A%7D%0Atable%2C%20table%20th%2C%20table%20td%20%7B%0Aborder%2Dleft%2Dstyle%3A%20none%3B%0Aborder%2Dright%2Dstyle%3A%20none%3B%0A%7D%0Atable%20thead%2C%20table%20tr%2Eeven%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Ap%20%7B%0Amargin%3A%200%2E5em%200%3B%0A%7D%0Ablockquote%20%7B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Apadding%3A%200%2E25em%200%2E75em%3B%0A%7D%0Ahr%20%7B%0Aborder%2Dstyle%3A%20solid%3B%0Aborder%3A%20none%3B%0Aborder%2Dtop%3A%201px%20solid%20%23777%3B%0Amargin%3A%2028px%200%3B%0A%7D%0Adl%20%7B%0Amargin%2Dleft%3A%200%3B%0A%7D%0Adl%20dd%20%7B%0Amargin%2Dbottom%3A%2013px%3B%0Amargin%2Dleft%3A%2013px%3B%0A%7D%0Adl%20dt%20%7B%0Afont%2Dweight%3A%20bold%3B%0A%7D%0Aul%20%7B%0Amargin%2Dtop%3A%200%3B%0A%7D%0Aul%20li%20%7B%0Alist%2Dstyle%3A%20circle%20outside%3B%0A%7D%0Aul%20ul%20%7B%0Amargin%2Dbottom%3A%200%3B%0A%7D%0Apre%2C%20code%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0Aborder%2Dradius%3A%203px%3B%0Acolor%3A%20%23333%3B%0Awhite%2Dspace%3A%20pre%2Dwrap%3B%20%0A%7D%0Apre%20%7B%0Aborder%2Dradius%3A%203px%3B%0Amargin%3A%205px%200px%2010px%200px%3B%0Apadding%3A%2010px%3B%0A%7D%0Apre%3Anot%28%5Bclass%5D%29%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Acode%20%7B%0Afont%2Dfamily%3A%20Consolas%2C%20Monaco%2C%20%27Courier%20New%27%2C%20monospace%3B%0Afont%2Dsize%3A%2085%25%3B%0A%7D%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%7B%0Apadding%3A%202px%200px%3B%0A%7D%0Adiv%2Efigure%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0Aimg%20%7B%0Abackground%2Dcolor%3A%20%23FFFFFF%3B%0Apadding%3A%202px%3B%0Aborder%3A%201px%20solid%20%23DDDDDD%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Amargin%3A%200%205px%3B%0A%7D%0Ah1%20%7B%0Amargin%2Dtop%3A%200%3B%0Afont%2Dsize%3A%2035px%3B%0Aline%2Dheight%3A%2040px%3B%0A%7D%0Ah2%20%7B%0Aborder%2Dbottom%3A%204px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Apadding%2Dbottom%3A%202px%3B%0Afont%2Dsize%3A%20145%25%3B%0A%7D%0Ah3%20%7B%0Aborder%2Dbottom%3A%202px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Afont%2Dsize%3A%20120%25%3B%0A%7D%0Ah4%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23f7f7f7%3B%0Amargin%2Dleft%3A%208px%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Ah5%2C%20h6%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23ccc%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Aa%20%7B%0Acolor%3A%20%230033dd%3B%0Atext%2Ddecoration%3A%20none%3B%0A%7D%0Aa%3Ahover%20%7B%0Acolor%3A%20%236666ff%3B%20%7D%0Aa%3Avisited%20%7B%0Acolor%3A%20%23800080%3B%20%7D%0Aa%3Avisited%3Ahover%20%7B%0Acolor%3A%20%23BB00BB%3B%20%7D%0Aa%5Bhref%5E%3D%22http%3A%22%5D%20%7B%0Atext%2Ddecoration%3A%20underline%3B%20%7D%0Aa%5Bhref%5E%3D%22https%3A%22%5D%20%7B%0Atext%2Ddecoration%3A%20underline%3B%20%7D%0A%0Acode%20%3E%20span%2Ekw%20%7B%20color%3A%20%23555%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%0Acode%20%3E%20span%2Edt%20%7B%20color%3A%20%23902000%3B%20%7D%20%0Acode%20%3E%20span%2Edv%20%7B%20color%3A%20%2340a070%3B%20%7D%20%0Acode%20%3E%20span%2Ebn%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Efl%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Ech%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Est%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Eco%20%7B%20color%3A%20%23888888%3B%20font%2Dstyle%3A%20italic%3B%20%7D%20%0Acode%20%3E%20span%2Eot%20%7B%20color%3A%20%23007020%3B%20%7D%20%0Acode%20%3E%20span%2Eal%20%7B%20color%3A%20%23ff0000%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%0Acode%20%3E%20span%2Efu%20%7B%20color%3A%20%23900%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%20code%20%3E%20span%2Eer%20%7B%20color%3A%20%23a61717%3B%20background%2Dcolor%3A%20%23e3d2d2%3B%20%7D%20%0A" rel="stylesheet" type="text/css" />
</head>
<body>
<h1 class="title toc-ignore">Introduction to R package DeClust</h1>
<h4 class="author"><em>Li Wang</em></h4>
<h4 class="date"><em>2019-04-15</em></h4>
<div id="introduction-to-r-package-declust" class="section level1">
<h1>Introduction to R package <em>DeClust</em></h1>
<div id="input-data-of-declust" class="section level2">
<h2>Input data of <em>DeClust</em></h2>
<p>The only two required inputs for DeClust is the gene expression matrix for bulky tumor, and the number of cancer subtypes. It has to be noted that the gene expression matrix is in the form of gene by sample with the original expression value (before log-transformation and non-negative). The gene symbols need to be provided as rownames, since we used gene symbols to identify genes corresponding to the immune and stromal markers. We provided a test dataset of exprM as an example.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">"DeClust"</span>);
<span class="kw">data</span>(exprM)
exprM[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>,<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>];</code></pre></div>
<pre><code>## sample1 sample2 sample3 sample4 sample5
## HDAC2 292.47754 179.68528 349.78428 398.77482 428.27253
## ANKS1A 315.05338 387.28148 264.55057 264.87132 249.93429
## RAF1 309.47248 299.73868 346.85527 353.70175 343.03136
## HSPA1L 71.74604 62.04993 57.43970 69.14479 76.55563
## GABRE 119.14868 147.67799 105.03538 120.91672 98.57263
## KCNN4 305.39429 503.35004 256.53022 282.58205 258.05154
## SLC25A3 1370.59568 1379.88828 1220.06082 1091.68869 1385.49021
## NOC4L 96.68730 126.60466 95.05389 92.02751 88.66475
## CTNNA1 520.47370 503.15162 617.79479 681.11550 528.33900
## EXOSC2 192.31710 192.91326 135.43959 150.80908 149.84253</code></pre>
</div>
<div id="run-declust-algorithms" class="section level2">
<h2>Run <em>DeClust</em> algorithms</h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r<-<span class="kw">deClustFromMarkerlognormal</span>(exprM,<span class="dv">3</span>)</code></pre></div>
</div>
<div id="output-data-of-declust" class="section level2">
<h2>Output data of <em>DeClust</em></h2>
<p>DeClust outputs a named list with three components, i.e., <em>subtype</em>, <em>subtypeprofileM</em> and <em>subtypefractionM</em>. <em>subtype</em> is a vector with length equal to the sample size, and it stores the sample clustering results;<em>subtypeprofileM</em> is a gene by compartment matrix, and it stores the inferred expression profile for each cancer subtype, immune and stromal compartment;<em>subtypefractionM</em> is a sample by compartment matrix, and it stores the estimated fraction of each compartment for each sample. Below is an examplary output.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">data</span>(r)
<span class="kw">table</span>(r<span class="op">$</span>subtype)</code></pre></div>
<pre><code>##
## subtype1 subtype2 subtype3
## 8 39 55</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r<span class="op">$</span>subtype[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>]</code></pre></div>
<pre><code>## sample1 sample2 sample3 sample4 sample5 sample6
## "subtype3" "subtype3" "subtype3" "subtype3" "subtype3" "subtype3"
## sample7 sample8 sample9 sample10
## "subtype3" "subtype3" "subtype3" "subtype3"</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r<span class="op">$</span>subtypeprofileM[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>,]</code></pre></div>
<pre><code>## stromal immune subtype3 subtype2 subtype1
## HDAC2 637.37642 568.41448 59.03738 99.82161 64.42567
## ANKS1A 161.09002 132.08029 538.82444 388.74949 233.79552
## RAF1 261.39640 534.52575 293.74906 279.24345 413.06543
## HSPA1L 54.20879 115.25038 69.30198 57.88039 83.80951
## GABRE 72.11677 63.35843 221.60217 100.79493 47.41807
## KCNN4 1.00000 328.09306 630.88354 135.80617 150.83310
## SLC25A3 542.32720 1999.96614 1709.08147 1683.29913 1611.61913
## NOC4L 26.08065 79.26929 198.47241 185.53234 210.98552
## CTNNA1 948.96426 457.01396 481.41176 278.80346 473.01900
## EXOSC2 107.67249 86.67385 286.00645 327.86733 439.21560</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r<span class="op">$</span>subtypefractionM[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>,]</code></pre></div>
<pre><code>## stromal immune subtype3 subtype2 subtype1
## sample1 0.29602310 0.21199658 0.4051635 0 0
## sample2 0.11363881 0.17576280 0.6322138 0 0
## sample3 0.28938492 0.29878423 0.3179217 0 0
## sample4 0.34609824 0.23304370 0.3541287 0 0
## sample5 0.27620739 0.34665815 0.2690447 0 0
## sample6 0.21723584 0.14924360 0.5665360 0 0
## sample7 0.31732221 0.19358030 0.4032309 0 0
## sample8 0.06013186 0.07075543 0.8073175 0 0
## sample9 0.12482130 0.13977744 0.6786954 0 0
## sample10 0.04521343 0.14642908 0.7280870 0 0</code></pre>
</div>
<div id="selection-of-best-subtype-number" class="section level2">
<h2>Selection of best subtype number</h2>
<p>In order to determine the best number of subtypes, we can run <em>deClust</em> with different subtype numbers,calculate the corresponding BIC and plot the BIC curve. The subtype number at the elbow point of BIC curve can be chosen as the final subtype number.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">BIC<-<span class="kw">sapply</span>(<span class="dv">2</span><span class="op">:</span><span class="dv">5</span>,<span class="cf">function</span>(subtypeN)<span class="kw">calculateBIC</span>(exprM,<span class="kw">deClustFromMarkerlognormal</span>(exprM,subtypeN)))
<span class="kw">plot</span>(<span class="dv">2</span><span class="op">:</span><span class="dv">5</span>, BIC,<span class="dt">xlab=</span><span class="st">"subtypeN"</span>,<span class="dt">type=</span><span class="st">"b"</span>)</code></pre></div>
</div>
<div id="precomputed-declust-results-of-tcga-datasets" class="section level2">
<h2>Precomputed DeClust results of TCGA datasets</h2>
<p><em>DeClust</em> package also contains the precomputed results for 13 TCGA datasets using the full version of DeClust. The results are stored in two R variables,i.e., <em>TCGAdeClustsubtype</em> and <em>TCGAdeClustprofileM</em>. <em>TCGAdeClustsubtype</em> contains the precomputed sample subtypes for 13 TCGA datasets. It is a named list of length 13, each item corresponding to one of the 13 TCGA dataset. Each item in the list is a vector of subtype annotation for all samples in that TCGA dataset.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">names</span>(TCGAdeClustsubtype)</code></pre></div>
<pre><code>## [1] "BLCA" "BRCA" "CESC" "COADREAD" "HNSC" "KIRC"
## [7] "KIRP" "LIHC" "LUAD" "LUSC" "OV" "THCA"
## [13] "UCEC"</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">TCGAdeClustsubtype[[<span class="st">"BLCA"</span>]][<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>]</code></pre></div>
<pre><code>## TCGA-2F-A9KO TCGA-2F-A9KP
## "luminal-papillary" "luminal|luminal-infiltrated "
## TCGA-2F-A9KQ TCGA-2F-A9KR
## "luminal-papillary" "luminal-papillary"
## TCGA-2F-A9KT TCGA-2F-A9KW
## "basal-squamous_1" "luminal|luminal-infiltrated "
## TCGA-4Z-AA7M TCGA-4Z-AA7N
## "luminal|luminal-infiltrated " "luminal-papillary"
## TCGA-4Z-AA7O TCGA-4Z-AA7Q
## "luminal-papillary" "basal-squamous_1"</code></pre>
<p><em>TCGAdeClustprofileM</em> stores the precomputed expression profiles for 13 TCGA datasets by the full version of <em>DeClust</em>.After <em>DeClust</em> was run for each of the 13 TCGA datasets separately (see methods of the paper for details),the estimated profiles for cancer subtypes and immune/stromal compartments were assembled together into one single matrix for down-stream analysis. Only genes existed in more than half of the 13 datasets were kept. In order to make profiles more comparable between different datasets (e.g.,removing batch effects), quantile normalization was applied to the assemebled profiles. It is stored as a matrix of 10890 rows(genes) and 87 columns(different cancer subtypes,and tissue-specific immune and stromal compartments)</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(TCGAdeClustprofileM)</code></pre></div>
<pre><code>## [1] 10890 87</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">TCGAdeClustprofileM[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>,<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</code></pre></div>
<pre><code>## BLCA_stromal BLCA_immune BLCA_luminal-papillary
## A1BG 1.66837146 1.7683864 1.3647588
## A1CF 0.09080154 0.3058121 0.2180411
## A2BP1 0.43337219 0.1469437 0.2639867
## A2LD1 1.62162591 1.7570908 1.6789369
## A2ML1 0.23361222 0.1469437 1.7274867
## A2M 2.42363216 2.2420670 2.1939402
## A4GALT 2.09549408 1.9953924 2.0195932
## A4GNT 0.35276758 0.6614475 0.4384091
## AACSL 0.09080154 0.8360497 0.5317063
## AADAC 1.06470444 0.1469437 1.4961192
## BLCA_luminal|luminal-infiltrated BLCA_basal-squamous_1
## A1BG 1.4234683 1.3868366
## A1CF 0.6252511 0.1853507
## A2BP1 0.3444466 0.2848240
## A2LD1 1.6773697 1.7327961
## A2ML1 1.9115048 1.9798606
## A2M 2.1242549 1.9969698
## A4GALT 1.8860725 2.0450674
## A4GNT 0.5591423 0.3492061
## AACSL 0.7961058 0.4956801
## AADAC 1.5258592 1.2642322</code></pre>
</div>
</div>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>