-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path20_factor_scores.Rmd
167 lines (124 loc) · 4.94 KB
/
20_factor_scores.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
title: "Food Factors"
output: html_document
---
# Load and Format the Data
```{r}
library(MASS)
library(plyr)
library(corrplot)
library(RColorBrewer)
suppressWarnings(library(ggplot2))
library(tidyverse)
```
```{r}
dat <- read.csv("z_trial_data.csv")[,-1]
dat <- as_tibble(dat)
```
To make the visuals more interpretable, we will flip 4 of the measures. This is to make these four measures become positively related to the other measures.
* Sugar => Low-Sugar
* Disgusting => Not Disgusting
* Healthy => Unhealthy
* Vitamins => Low vitamins
```{r}
rdat <- dat %>% select(stimulus, choice.rating, starts_with("rating"))
# Rename
colnames(rdat) <- sub("rating[.]", "", colnames(rdat))
colnames(rdat) <- sub("[.]rating", "", colnames(rdat))
# Flip
rdat$sugar <- 10 - rdat$sugar
rdat$disgusting <- 10 - rdat$disgusting
rdat$healthy <- 10 - rdat$healthy
rdat$vitamins <- 10 - rdat$vitamins
## rename
cinds <- sapply(c("sugar", "disgusting", "healthy", "vitamins"), function(x) which(colnames(rdat) == x))
colnames(rdat)[cinds] <- c("low-sugar", "not-disgusting", "unhealthy", "low-vitamins")
# For use with analysis
rmat <- as.matrix(rdat[,-1])
rownames(rmat) <- rdat$stimulus
rmat.nochoice <- rmat[,-1]
# Shows the ratings/choice for each of the food items
head(rmat)
```
# Correlation Matrix
Get the average of each of the ratings/choice across participants. Then compute and visualize the correlations between each pair of ratings/choice.
```{r}
ave.mat <- rdat %>%
group_by(stimulus) %>%
summarise_all(mean) %>%
select(-stimulus)
M <- cor(ave.mat)
diag(M) <- 0
corrplot(M, method="circle", diag = F, order="hclust",
#is.corr=F, cl.lim=c(-0.85,0.85),
col=rev(colorRampPalette(brewer.pal(n=11, name="RdBu"))(256)))
```
Manually order based on what we might have expected to see
```{r}
new.ord <- c("familiar", "calories", "fat", "protein", "carbohydrates", "gluten", "low-vitamins", "sodium", "low-sugar", "sweetsaltysavory", "filling", "texture", "not-disgusting", "feel", "othertasty", "tasty", "unhealthy", "choice")
ave.mat2 <- ave.mat[,new.ord]
M <- cor(ave.mat2)
diag(M) <- 0
corrplot(M, method="circle", diag = F, #order="hclust",
#is.corr=F, cl.lim=c(-0.85,0.85),
col=rev(colorRampPalette(brewer.pal(n=11, name="RdBu"))(256)))
```
# Factor Analysis
We can see that there is a clear structure in the data above. Here, we try to extract that structure by doing a factor analysis.
For the factor analysis, I will remove the choice data to make it only about the ratings.
Please note in the section below, I go through a few different possibilities for the number of factors. I select 3 since that is the best with the CNG method and also because that number best explains choice behavior.
Determine the optimal number of factors (it's 3)
```{r}
nFactors::nCng(as.data.frame(rmat.nochoice), model="factors")
```
```{r}
cols2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D",
"#F4A582", "#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE",
"#4393C3", "#2166AC", "#053061"))(200)
```
Plot 3 factors and there factor scores. I use hierarchical clustering to order the different ratings and make the plot more interpretable.
```{r}
fit <- factanal(rmat.nochoice, 3, scores="regression")
d <- dist(fit$loadings)
hc <- hclust(d, method="average")
dend <- as.dendrogram(hc)
loads <- fit$loadings[order.dendrogram(dend),]
loads <- rbind(choice=loads[rownames(loads) == "choice",], loads[rownames(loads) != "choice",])
# Make tasty first, then health, and finally sweet/savory
loads2 <- loads[,c(2,1,3)]
colnames(loads2) <- sprintf("Factor-%i", 1:3)
# Put the sweet/savory to the end
#loads2 <- rbind(loads2[-c(1:4),], loads2[4:1,])
loads2 <- rbind(loads2[-c(1:4),], loads2[1:4,])
# Just for reporting purposes
sort(loads2[,1], decreasing = T)
sort(loads2[,2], decreasing = T)
sort(loads2[,3], decreasing = T)
# Plot
corrplot::corrplot(loads2, diag=T, col=rev(cols2), cl.pos="n")
```
Do a different arrangement for the audience.
```{r}
rlabs <- c("tasty", "othertasty", "feel", "texture", "not-disgusting", "familiar",
"calories", "unhealthy", "fat", "carbohydrates", "low-vitamins", "gluten", "sodium",
"protein", "sweetsaltysavory", "filling", "low-sugar")
inds <- sapply(rlabs, function(x) which(rownames(loads2)==x))
corrplot::corrplot(loads2[inds,], diag=T, col=rev(cols2), cl.pos="n")
```
Gather the scores and re-label the columns accordingly.
```{r}
# scores and re-label columns
scores3 <- fit$scores
ind1 <- which.max(fit$loadings[rownames(fit$loadings) == "unhealthy",])
ind2 <- which.max(fit$loadings[rownames(fit$loadings) == "tasty",])
ind3 <- which.max(fit$loadings[rownames(fit$loadings) == "sweetsaltysavory",])
colnames(scores3) <- c("Food.UnHealth", "Food.Taste", "Food.SweetProtein")[c(ind1,ind2,ind3)]
```
Save the data out
```{r}
head(scores3)
```
```{r}
outdf <- data.frame(subjectId=dat$subjectId, stimulus=dat$stimulus, scores3)
write.csv(outdf, file="z_foodrating_factors.csv")
```