-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
244 lines (177 loc) · 6.22 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
# fig.path = "README-"
fig.path = "tools/README-"
)
set.seed(0)
```
# CGGP
<!-- badges: start -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/CGGP)](https://cran.r-project.org/package=CGGP)
[![codecov](https://codecov.io/github/CollinErickson/CGGP/graph/badge.svg?token=FMnP9TEFBk)](https://codecov.io/github/CollinErickson/CGGP)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/CGGP?color=blue)](https://r-pkg.org/pkg/CGGP)
[![R-CMD-check](https://github.com/CollinErickson/CGGP/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/CollinErickson/CGGP/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of CGGP is to provide a sequential design of experiment algorithm that can efficiently use many points and interpolate exactly.
## Installation
You can install CGGP from GitHub with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("CollinErickson/CGGP")
```
## Example
To create a CGGP object:
```{r example}
## basic example code
library(CGGP)
d <- 4
CG <- CGGPcreate(d=d,200)
print(CG)
```
A new `CGGP` object has design points that should be evaluated next, either from `CG$design` or `CG$design_unevaluated`.
```{r evaluatedesign}
f <- function(x) {x[1]^2*cos(x[3]) + 4*(0.5-x[2])^3*(1-x[1]/3) + x[1]*sin(2*2*pi*x[3]^2)}
Y <- apply(CG$design, 1, f)
```
Once you have evaluated the design points, you can fit the object with `CGGPfit`.
```{r, echo = FALSE}
set.seed(1)
```
```{r fit}
CG <- CGGPfit(CG, Y)
CG
```
If you want to use the model to make predictions at new input points,
you can use `CGGPpred`.
```{r pred}
xp <- matrix(runif(10*CG$d), ncol=CG$d)
CGGPpred(CG, xp)
```
To add new design points to the already existing design,
use `CGGPappend`.
It will use the data already collected to find the most
useful set of points to evaluate next.
```{r, echo = FALSE}
set.seed(1)
```
```{r append}
# To add 100 points
CG <- CGGPappend(CG, 100)
```
Now you will need to evaluate the points added to `CG$design`,
and refit the model.
```{r, echo = FALSE}
set.seed(1)
```
```{r refit}
ynew <- apply(CG$design_unevaluated, 1, f)
CG <- CGGPfit(CG, Ynew=ynew)
```
### Plot functions
```{r, echo = FALSE}
set.seed(0)
```
There are a few functions that will help visualize the CGGP design.
#### `CGGPplotblocks`
`CGGPplotblocks` shows the block structure when projected down
to all pairs of two dimensions.
The plot is symmetric.
The facet labels be a little bit confusing.
The first column has the label 1, and it looks like it is saying that
the x-axis for each plot in that column is for `X1`, but it is
actually the y-axis that is `X1` for each plot in that column.
```{r plotblocks}
CGGPplotblocks(CG)
```
#### `CGGPplotheat`
`CGGPplotheat` is similar to `CGGPplotblocks` and can be easier to read
since it is only a single plot.
The $(i,j)$ entry shows the maximum value for which a block was selected
with $X_i$ and $X_j$ at least that large.
The diagonal entries, $(i, i)$, show the maximum depth for $X_i$.
A diagonal entry must be at least as large as any entry in its column or row.
This plot is also symmetric.
```{r heat}
CGGPplotheat(CG)
```
#### `CGGPhist`
`CGGPhist` shows histograms of the block depth in each direction.
The dimensions that have more large values are dimensions that have been
explored more.
These should be the more active dimensions.
```{r hist}
CGGPplothist(CG)
```
#### `CGGPplotcorr`
`CGGPplotcorr` gives an idea of what the correlation structure in each
dimension is.
The values plotted do not represent the actual data given to CGGP.
Each wiggly line represents a random Gaussian process drawn using the
correlation parameters for that dimension from the given CGGP model.
Dimensions that are more wiggly and have higher variance are the
more active dimensions.
Dimensions with nearly flat lines mean that the corresponding input
dimension has a relatively small effect on the output.
```{r corrplot}
CGGPplotcorr(CG)
```
#### `CGGPplotvariogram`
`CGGPplotvariogram` shows something similar to the semi-variogram
for the correlation parameters found for each dimension.
Really it is just showing how the correlation function decays for
points that are further away.
It should always start at `y=1` for `x=0` and decrease in `y`
as `x` gets larger
```{r vario}
CGGPplotvariogram(CG)
```
#### `CGGPplotslice`
`CGGPplotslice` shows what the predicted model along each individual
dimension when the other input dimensions are held constant, i.e.,
a slice along a single dimension.
By default the slice is done holding all other inputs at 0.5, but this
can be changed by changing the argument `proj`.
The black dots are the data points that are in that slice
If you change `proj` to have values not equal to 0.5, you probably
won't see any black dots.
The pink regions are the 95% prediction intervals.
This plot is the best for giving an idea of what the higher dimension
function look like.
You can see how the output changes as each input is varied.
```{r plotslice}
CGGPplotslice(CG)
```
The next plot changes so that all the other dimensions are held constant
at 0.15 for each slice plot.
When moving from the center line, the error bounds generally
should be larger since
it is further from the data, but we should see similar patterns
unless the function is highly nonlinear.
```{r plotslice2}
CGGPplotslice(CG, proj = rep(.15, CG$d))
```
#### `CGGPplottheta`
`CGGPplottheta` is useful for getting an idea of how the
samples for the correlation parameters (theta) vary compared
to the maximum a posteriori (MAP).
This may be helpful when using `UCB` or `TS` in `CGGPappend`
to get an idea of how much uncertainty there is in the
parameters.
Note that there are likely multiple parameters for each input dimension.
```{r plottheta}
CGGPplottheta(CG)
```
#### `CGGPplotsamplesneglogpost`
`CGGPplotsamplesneglogpost` shows the negative log posterior
for each of the different samples for theta.
The value for the MAP is shown as a blue line.
It should be at the far left edge if it is the true MAP.
```{r samplesneglogpost}
CGGPplotsamplesneglogpost(CG)
```