generated from jtr13/cctemplate
-
Notifications
You must be signed in to change notification settings - Fork 78
/
3d_visualization_in_r.Rmd
250 lines (183 loc) · 9.64 KB
/
3d_visualization_in_r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# 3D Visualization in R
Tianyu Han and Shijia Huang
```{r, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, echo = TRUE)
#Installation Instructions
#install.packages('plotly')
#install.packages('HSAUR2')
#install.packages('scatterplot3d')
#install.packages('mlbench')
#install.packages('plot3D')
#If working on a mac machine, xquartz x11 might need to be installed from the web in order to use plot3D
library(scatterplot3d)
library('mlbench')
library(plot3D)
library(plotly)
data("household",package = "HSAUR2")
```
## Motivation
As an important part of data visualization, 3D plotting makes the data exploration part easier for users and allow a visual display of datasets.By plotting data points on three axes, 3D plots describe the relationship between these three variables and are useful to identify underlying patterns and interactions that are not shown on 2D graphs.
In this tutorial, we will introduce different packages for 3D plots, including package Scatterplot3D, package plot3D, and also plotly. By the end of the tutorial, one will be able to choose the most suitable package for his/her project.
What we can learn from this project is that there are always different ways to tackle the same problem using different R libraries and packages. It will require extensive research and trials in order to determine which one works best in the given scenario.
## Scatterplot3D
Scatterplot3d is an R package that displays multidimensional data in 3D space.There is only one function scatterplot3d() in this package.The usage of scatterplot3d() will be discussed with examples as below.
### Load Data
We use the preloaded dataset USArrests as an example to show what information can be draw from 3D plot using scatterplot3d.
```{r}
data("USArrests")
head(USArrests)
```
### Create matrix
For Scatterplot3d, the dataframe provided must be converted into a matrix. Here we select Assault, Urban Population, and Rape as our three axes.
```{r}
USArrestsMatrix <- as.matrix(USArrests)
x1 <- USArrestsMatrix[,2] ## Assault
y1 <- USArrestsMatrix[,3] ## Urban Population
z1 <- USArrestsMatrix[,4] ## Rape
```
### Generate 3d scatter plot
Creating a graph using scatterplot3d. "highlight" gives a color scale that enables users to understand the relative position of each data point. "pch" specifies a plotting shape, here we set pch = 16, which is a small dot.
```{r}
sp1 <- scatterplot3d(x1,y1,z1, highlight.3d = TRUE, pch = 16, angle = 45,
xlab = "Assault",
ylab = "Urban Population",
zlab = "Rape")
```
We can also remove the box (or grid) of the graph and change the color of the points. Note that when setting color, the "highlight.3d" argument should be specified as FALSE
```{r}
sp2 <- scatterplot3d(x1,y1,z1, pch = 16, angle = 45,highlight.3d = FALSE,
xlab = "Assault",
ylab = "Urban Population",
zlab = "Rape",
grid = TRUE,
box = FALSE,
color = c("pink"))
```
Adding labels to the graph, "cex" specifies the font size.
```{r}
sp3 <- scatterplot3d(x1,y1,z1, highlight.3d = TRUE, pch = 18, angle = 45,
xlab = "Assault",
ylab = "Urban Population",
zlab = "Rape")
text(sp3$xyz.convert(USArrests[2:4]),labels = rownames(USArrests), cex = 0.5, color = 'pink')
```
### 3D scatter plot with x-y plane position
Use the "Type = 'h'" to create vertical lines between each data point and the x-y plane.
```{r}
sp4 <- scatterplot3d(x1,y1,z1, highlight.3d = TRUE, pch = 18, angle = 45,
xlab = "Assault",
ylab = "Urban Population",
zlab = "Rape",
type = "h")
```
## 3D plot and PCA
In data science, 3D plot can also be used for machine learning steps. For example, by plotting principal components in a 3D space, we could efficiently observe the interaction between the important vectors of an input data.
We use the preloaded data "Glass" to perform the principal component analysis and 3D visualization of components.
### Load Package "mlbench" and use the Glass dataset.
```{r}
data(Glass)
head(Glass)
```
### Data Cleaning
Perform PCA on the dataset and convert the pca result into a dataframe. Here we plot three components of the PCA results.Specify three colors for them."shape" specifies three different shapes for each component.
```{r}
results <- prcomp(Glass[,2:4], scale = TRUE)
results$rotation <- -1*results$rotation
results$rotation
results$x <- -1*results$x
head(results$x)
pca.result <- results$x
pca.result <-data.frame(pca.result)
head(pca.result)
pca.result$Type <- (Glass$Type)
```
### Define color and shape parameter.
```{r}
## choose 6 colors for 6 glass types
colors <- c("#E69F00", "#56B4E9","#B2182B","#D1E5F0","#92C5DE","#2166AC")
colors <- colors[as.numeric(pca.result$Type)]
## choose 6 shapes for 6 glass types
shape<-10:15
shape<-shape[as.numeric(pca.result$Type)]
```
### Generate graph
Plot the result of our PCA analysis following the step in the previous part. Adjust angle for best visualization. Below is an example of how a 3D plot can help us see the contribution of each component in classifying types of glass.
```{r}
PCA3D <- scatterplot3d(pca.result[,1:3],
color=colors,
pch = shape,
cex.symbols = 3,
angle = 100)
legend("top", legend = levels(pca.result$Type),
col = c("#E69F00", "#56B4E9","#B2182B","#D1E5F0","#92C5DE","#2166AC"),
pch = c(10,11,12,13,14),
inset = -0.1, xpd = TRUE, horiz = TRUE)
```
## Other usage of the scatterplot3d function
Sometimes it is hard to imagine the relationship between two functions or graph, by plotting them on a 3D space, we could visualize the interaction on a dynamic environment.
Here is a simple example of how we could graph the interaction of cos and sin function.
```{r}
z <- seq(-15, 15, 0.05)
x <- cos(z)
y <- sin(z)
scatterplot3d(x, y, z, highlight.3d=TRUE, col.axis ="blue",col.grid ="lightblue", main="an example of cosine and sine interaction", pch=20)
```
## 3D Histogram
If we were to generate a histogram in 3d, we can use the plot 3D package. We first initiate the x-axis and the y-axis. Then, we need to create z as matrix that has the dimension |x| * |y|. We can then use hist3D function in the package to help us generate the 3D histogram that we need.
```{r}
x = c(1, 2)
y = c(1, 2)
z = c(1, 2, 2, 3)
mat1 <- matrix(z,nrow=2,ncol=2,byrow=TRUE)
hist3D(z=mat1, x = x, y= y)
```
## 3D scatter plot using plotly
### Demo Data
In order to better demonstrate the different features of plotly 3D Scatterplot, we selected a sample data which includes 40 observations on household expenditure for single men and women. There are 5 variables for each observation:
Housing: money(usd) spent on housing
Food: money(usd) spent on food
Goods: money(usd) spent on goods
Service: money(usd) spent on service
Gender: female or male
```{r}
household
```
### The classic 3D Scatterplot
```{r}
fig <- plot_ly(household, x = ~housing, y = ~food, z = ~goods + service)
fig <- fig %>% layout(scene = list(xaxis = list(title = 'housing'),
yaxis = list(title = 'food'),
zaxis = list(title = 'goods and services')))
fig
```
### Adding colors to 3D Scatterplot
In order differentiate the observations of opposite genders, we will need to add colors to our 3D scatter plot. It is done as followed:
```{r}
fig <- plot_ly(household, x = ~housing, y = ~food, z = ~goods + service,
color = ~gender, colors = c('#17becf', '#d62728'))
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'housing'),
yaxis = list(title = 'food'),
zaxis = list(title = 'goods and services')))
fig
```
### Adding sizes to 3D Scatterplot
It is interesting to note that size is available as a fifth parameter if it helps us plot our findings. In our example, we used size to plot the overall expenditure of the household. It help us visualize the overall trend better.
```{r}
fig <- plot_ly(household, x = ~housing, y = ~food, z = ~goods + service,
color = ~gender, colors = c('#2ca02c', '#8c564b'), size = ~ housing + food + goods + service, sizes = c(500, 5000))
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'housing'),
yaxis = list(title = 'food'),
zaxis = list(title = 'goods and services')))
fig
```
## Conclusion
In our tutorial, we have introduced scatterplot3d, plot3d, and also plotly. They each have their respective advantages. If you need a more interactive graph that allows zooming in and rotating, plotly would be your better choice. However, if you were to perform principal component analysis and better visualize your results, it would be easier to use scatterplot3d.
## Works Cited
Ligges, Uwe, and Martin Mächler. “Scatterplot3d - an R Package for Visualizing Multivariate Data.” Journal of Statistical Software, vol. 8, no. 11, Foundation for Open Access Statistic, 2003, https://doi.org/10.18637/jss.v008.i11.
http://www.sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualization
http://www.sthda.com/english/wiki/colors-in-r#:~:text=In%20R%2C%20colors%20can%20be,taken%20from%20the%20RColorBrewer%20package.
https://www.statology.org/principal-components-analysis-in-r/
https://plotly.com/r/3d-scatter-plots/
http://www.countbio.com/web_pages/left_object/R_for_biology/R_fundamentals/3D_histograms_R.html