-
Notifications
You must be signed in to change notification settings - Fork 0
/
tutorial.Rmd
376 lines (257 loc) · 13.1 KB
/
tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
title: "Antigenic cartography using Racmacs"
output: html_document
date: "2024-02-23"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
1. Creating a map
2. Plotting a map
3. Quality assuring a map
I'm assuming you have a basic understanding of antigenic cartography. If not, you can learn more [here](https://acorg.github.io/Racmacs/articles/intro-to-antigenic-cartography.html)
## Making a map
```{r library}
library(ggplot2)
library(Racmacs)
options(RacOptimizer.num_cores = parallel::detectCores())
```
### Reading in data
First we need to read some data in. The way to read data in depends on the format.
```{r readin}
data1 <- read.csv("data/small-dataset.csv", row.names=1)
# replace "/" in sera names
colnames(data1) <- gsub(".", "/", colnames(data1), fixed=T)
```
```{r make-acmap}
map <- acmap(titer_table = data1)
```
### Making a map
You don't always get the same map as the starting coordinates are random - that's why we run many optimisations. Before each map-making, I'm setting a random seed so the answer is reproducible.
```{r seed}
ran_seed <- 5387
```
You can make the map directly from the table or from the acmap object we just created. These methods are effectively equivalent.
```{r make-map}
set.seed(ran_seed)
map_from_table <- make.acmap(titer_table = data1, number_of_dimensions = 2, number_of_optimizations = 100)
set.seed(ran_seed)
map_from_map <- optimizeMap(map = map, number_of_dimensions = 2, number_of_optimizations = 100)
```
If you have a sensible reason, you can fix the column base:
```{r fixed-col-base}
set.seed(ran_seed)
map_fixed_col_base <- optimizeMap(map = map, number_of_dimensions = 2, fixed_column_bases = rep(8, 21), number_of_optimizations = 100)
```
Or set a minimum column base:
```{r min-col-base}
set.seed(ran_seed)
map_min_col_base <- optimizeMap(map = map, number_of_dimensions = 2, minimum_column_basis = "1280", number_of_optimizations = 100)
```
You can also make a map in 3D. We will check later if this is a good idea.
```{r map3d}
set.seed(ran_seed)
map_3d <- optimizeMap(map = map, number_of_dimensions = 3, number_of_optimizations = 100)
```
### Saving your map
```{r save-map}
save.acmap(map_from_map, "out/map_from_map.ace")
```
### Merging tables
Remember when merging tables that names & IDs need to be identical
First, open the file with the table to be merged.
```{r loading-second-dataset}
data2 <- read.csv("data/small-dataset-2.csv", row.names=1)
# replace "/" in sera names
colnames(data2) <- gsub(".", "/", colnames(data2), fixed=T)
map2 <- acmap(titer_table= data2)
```
The simplest method is to merge table together. You can also give the function a list of maps to merge.
```{r merging-tables}
merge_table <- mergeMaps(map, map2, method="table")
merge_map <- optimizeMap(merge_table, 2, 100)
```
If you already have a map, you can merge your new data in either by using the original map as a starting point, or freezing the original map and then merging new data in (this can be useful when you are merging mutants in and want to keep the background map consistent). I won't demonstrate the frozen merge here as the two datasets completely overlap.
```{r other-merge}
merge_incremental <- mergeMaps(map_from_map, map2, method="incremental-merge", number_of_dimensions = 2, number_of_optimizations = 100)
```
There are some additional methods described [here](https://acorg.github.io/Racmacs/articles/merging-maps.html) that may be applicable to your research.
## Plotting a map
```{r plot}
plot(map_from_map, plot_stress=T)
ggplot(map_from_map, plot_stress=T)
view(map_from_map)
```
### Adding sequence data
Largely copied from [here](https://acorg.github.io/Racmacs/articles/adding-sequences.html) and there is also a [youtube tutorial](https://www.youtube.com/watch?v=0IzubUMvl10)
Sequence data is added as a matrix, so if you start with a fasta file you will need to do a little pre-processing (aligning and trimming your sequences & then changing from a list to a matrix).
```{r seq data}
ag_sequences <- read.csv(
file = system.file("extdata/h3map2004_sequences_ag.csv", package = "Racmacs"),
colClasses = "character",
row.names = 1
)
sr_sequences <- read.csv(
file = system.file("extdata/h3map2004_sequences_sr.csv", package = "Racmacs"),
colClasses = "character",
row.names = 1
)
agSequences(map_from_map) <- ag_sequences[agNames(map_from_map),]
srSequences(map_from_map) <- sr_sequences[srNames(map_from_map),]
```
### Colours, shapes and sizes
Color, shape, size, outline colour and outline width can be customised for antigens and sera. A few examples below. You can find more detail on styling maps [here](https://acorg.github.io/Racmacs/articles/customising-map-appearance.html)
```{r}
# color by year
yr <- as.numeric(paste0(c(rep("19", 36), rep("20", 13)), sapply(strsplit(agNames(map_from_map), split="/", fixed=T), "[[", 3)))
agFill(map_from_map) <- rainbow(12)[yr-1992]
view(map_from_map)
agFill(map_from_map) <- "grey60"
# color by genetics
agFill(map_from_map)[agSequences(map_from_map)[,156]=="K"] <- "forestgreen"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="Q"] <- "skyblue"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="H"] <- "gold"
agSize(map_from_map)[c("PM/2007/99", "SY/5/97", "NA/933/95", "WU/359/95")] <- 10
agShape(map_from_map)[c(36, 29, 11, 13)] <- "EGG"
srShape(map_from_map)[c("PM/2007/99", "SY/5A/97", "SY/5B/97", "SY/5HAY/97", "SY/5V/97", "NA/933/95","WU/359B/95")] <- "UGLYEGG"
view(map_from_map)
```
## Quality assurance
### Stress
The stress is the measure of how well the data fits the map (lower stress is better). It can be calculated for the whole map or per antigen & per serum
```{r stress}
mapStress(map_from_map)
hist(agStress(map_from_map))
hist(srStress(map_from_map))
hist(agStressPerTiter(map_from_map))
hist(srStressPerTiter(map_from_map))
ams <- allMapStresses(map_from_map)
head(ams)
plot(ams)
```
If the top values of the stress are not similar, then the optimisation to make the map has not converged and either more optimisations are required or there are other issues with your data.
We can also compare stress between different merging methods
```{r merges}
mapStress(merge_map)
mapStress(merge_incremental)
```
### Viewer features (connection lines, error lines, stress dots)
```{r viewer}
view(map_from_map)
```
From the "Coloring" tab you can colour by stress; higher stress points are ones where the antigenic map fits the data less well.
From the "Diagnostics" tab, you can select an antigen and turn on
* connection lines (button "C" or ctrl-c) which show which sera have been titrated against that antigen
* error lines (button "E" or ctrl-e) which show the error in positioning. Red is too close and blue is too far. If both the antigen and sera are moved to the end of their line then there would be no difference between the map and table distances for that measurement.
* titre values (no button, ctrl-t)
### Procrustes
We use procrustes to compare two maps with antigens & sera in common.
First, we need to get another map to compare with (in that I need to remoce the antigen & serum IDs from the 2004 map so that the algorithm matches by name).
```{r get-2004-map}
map_2004 <- read.acmap("data/seq-t9a-mod27.ace")
srIDs(map_2004) <- rep("", numSera(map_2004))
agIDs(map_2004) <- rep("", numAntigens(map_2004))
```
```{r procrustes}
map_from_map <- realignMap(map_from_map, map_2004)
pc <- procrustesMap(map_from_map, map_2004)
view(pc)
pc_data <- procrustesData(map_from_map, map_2004)
names(pc_data)
pc_data$total_rmsd
```
You can see another example of procrustes, including 3D procrustes, [here](https://acorg.github.io/Racmacs/articles/comparing-maps.html)
We can also perform a procrustes to the other optimisations and plot this.
```{r self-procrustes}
pc_rmsd <- pc_rmedsd <- rep(NA, numOptimizations(map_from_map))
for (i in 1:numOptimizations(map_from_map)){
pc_output <- procrustesData(map_from_map, map_from_map, 1, i)
pc_rmsd[i] <- pc_output$total_rmsd
pc_rmedsd[i] <- sqrt(median(c(pc_output$ag_dists, pc_output$sr_dists)^2))
}
plot(pc_rmsd)
plot(ams, pc_rmsd)
plot(pc_rmedsd)
plot(ams, pc_rmedsd)
plot(pc_rmsd, pc_rmedsd, asp=1)
abline(0,1)
```
Ideally there will be a low procrustes distance between the top optimisations. If maps with similar stress have high procrustes distances, then the map is metastable. The RMSD (root mean square deviation) is affected by outliers such as a small number of antigens or sera hemisphering so the RMedSD (root median square deviation) can be a more robust measure.
### Point uncertainty (blob)
Point uncertainty can be visualised using bootsrapping where data is randomly excluded and the map re-made. Here I've used the default parameters of 1000 repeats and 100 optimisations per repeat. I've commented out the code as it might take a while to run on your machine - you can simply load the completed map for plotting.
There's more detail on the bootstrapping methods [here](https://acorg.github.io/Racmacs/articles/assessing_map_uncertainty.html)
```{r blob}
# set.seed(ran_seed)
# bs_map <-bootstrapMap(map_from_map, method = "resample")
# save.acmap(bs_map, "out/bs_map.ace")
# blob95 <- bootstrapBlobs(bs_map, , conf.level=0.95)
# blob68 <- bootstrapBlobs(bs_map)
# blob10 <- bootstrapBlobs(bs_map, conf.level=0.1)
# save.acmap(blob95, "out/blob95_map.ace")
# save.acmap(blob68, "out/blob68_map.ace")
# save.acmap(blob10, "out/blob10_map.ace")
blob68 <- read.acmap("out/blob68_map.ace")
plot(blob68, plot_stress=T)
view(blob68)
blob10 <- read.acmap("out/blob10_map.ace")
plot(blob10, plot_stress=T)
view(blob10)
```
### Dealing with warnings
Firstly, we can exclude the antigens that are uncoordinated and giving the first warning.
```{r avoid-reliability-warning}
map_from_map_rm <- removeAntigens(map_from_map, c('EN/7/94', 'FI/339/95'))
set.seed(ran_seed)
map_from_map_rm <- optimizeMap(map = map_from_map_rm, number_of_dimensions = 2, number_of_optimizations = 100)
pc <- procrustesMap(map_from_map_rm, map_from_map_rm, 1, 2)
view(pc)
```
Then we can remove other antigens that are potentially poorly coordinated
```{r try-avoid-unstable-warning}
hist(rowSums(data1!="*"), xlab="Number of measured data points per antigen", breaks=1:20)
map_from_map_rm2 <- removeAntigens(map_from_map, which(rowSums(data1!="*")<4))
set.seed(ran_seed)
# map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)
map_from_map_rm2 <- removeSera(map_from_map_rm2, 'MA/G252/93')
set.seed(ran_seed)
map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)
pc <- procrustesMap(map_from_map_rm2, map_from_map_rm2, 1, 2)
view(pc)
```
We still have an unstable map. This is probably due to the limitations of the data and if this was a real experiment, we would consider making further measurments.
### Dimension testing
You can check how well different numbers of dimensions represent the underlying data. The stress will usually decrease when increasing the number of dimensions.
```{r dim-stress}
mapStress(map_from_map)
mapStress(map_3d)
```
By removing data, we can test how well this removed data is predicted by maps with differing numbers of dimensions (cross-validation). The default setting test 1-5 dimesions, removing 10% of the data, 1000 optimisation and 100 replicates per dimension. Here I've saved the output of the code as it takes a while to run.
```{r dim-test}
# dim_test <- dimensionTestMap(map_from_map)
# save(dim_test, file="out/dim_test.RData")
load("out/dim_test.RData")
dim_test
#plotting for detectable titres
ci <- 1.96*sqrt(dim_test$var_rmse_detectable/100)
plot(dim_test$dimensions, dim_test$mean_rmse_detectable, ylim=range(c(dim_test$mean_rmse_detectable+ci, dim_test$mean_rmse_detectable-ci)))
arrows(dim_test$dimensions, dim_test$mean_rmse_detectable+ci, dim_test$dimensions, dim_test$mean_rmse_detectable-ci, length=0.1, angle=90, code=3)
```
The minimum is at 2 dimensions, suggesting this is best for this dataset. Sometimes you may want to use fewer dimensions for interpretability if there is marginal benefit with higher dimensions.
## FAQs
### How to get distance between all antigens?
```{r dist}
dist_mat <- as.matrix(dist(agCoords(map_from_map)))
```
### How to handle repeated sera within the same table?
If they are sufficiently similar, they can be merged.
### How many sera should you have overlapping when merging tables?
Map cohesion calculates the vertex connectivity of the map (the minimum number of point that need to be removed to eliminate all paths from one point to another). If you have n overlapping sera & antigens between merged tables, then the vertex connectivity will be 2n (assuming that non of these titres are undetectable). A vertex connectivity of "number of map dimensions + 1" is required for map stability. Therefore, 3 overlapping antigens and sera (vertex connectivity = 6) should be sufficient if the map has 3 dimensions (minimum vertex connectivity = 4).
```{r map-cohesion}
# original map
mapCohesion(map_from_map)
# map with uncoordinated antigens removed
mapCohesion(map_from_map_rm)
# map with potentialy poorly coordinated antigens removed
mapCohesion(map_from_map_rm2)
```