-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathlab-01_answer.Rmd
230 lines (143 loc) · 7.11 KB
/
lab-01_answer.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
title: "My answers"
author: "My name"
date: "2023-04-16"
output: html_document
---
## Motivation
Digital Social Networks - the connections between users - are the basis of all social media platforms.
To build an understanding of social media, one could reasonably argue we first need to understand the connections between users.
In this tutorial you will learn how to construct networks of Twitter users based on their retweet behaviour, who a user mentions in a tweet or who a user replies to.
Using an existing data set you will learn how to visualise a Twitter network based on replies to an original tweet.
## Learning Goals
By the end of this tutorial you will be able to:
* Construct an edge list that connects Twitter users based on mentions, replies and/or retweets
* Plot networks of Twitter users using `tidygraph` and `ggraph`
## Instructions to Students
These tutorials are **not graded**, but we encourage you to invest time and effort into working through them from start to finish.
Add your solutions to the `lab-01_answer.Rmd` file as you work through the exercises so that you have a record of the work you have done.
Obtain a copy of the answer file using Git.
To clone a copy of this repository to your own PC, use the following command:
Once you have your copy, open the answer document in RStudio as an RStudio project and work through the questions.
The goal of the tutorials is to explore how to "do" the technical side of social media analytics.
Use this as an opportunity to push your limits and develop new skills.
When you are uncertain or do not know what to do next - ask questions of your peers and the instructors on the classes Slack channel `#lab01-discussion`.
## Plotting Social Media Networks
You will need to use the following `R` libraries throughout this exercise:
In this exercise you will work with some existing Twitter data.
The data you will use is a collection of tweets that all have the hashtag `#rstats` in their text.
The data are collected from 2018, and can be downloaded from the internet:
```{r, rstat-dowload, cache=TRUE}
url <- "https://bit.ly/3r8Gu4M"
# where to save data
out_file <- "data/rstats_tweets.rds"
# download it!
download.file(url, destfile = out_file, mode = "wb")
```
The data that you downloaded are an `.rds` file, so you can load them with the `read_rds` function from the `readr` library:
Your goal will be to construct a network graph that visualizes the connections between Twitter users.
For this exercise you are interested in connections between Twitter users who reply to each others tweets.
That is, two users are connected if user A has replied to user B's tweet or vice versa.
Before you work through the guided exercise, we recommend that you take some time to look at the data and understand it's basic structure.
There are a lot of column names, and you will want to understand what is in them.
Now, let's begin the analysis:
1. Create a new data set that only includes tweets that contain replies:
```{r}
# Write your answer here
```
2. Further reduce the size of the data by dropping the columns you will not need.
Your new data set should only include the columns named `screen_name` and `reply_to_screen_name`.
Rename these columns to `from` and `to`.
```{r}
# Write your answer here
```
3. When a user on Twitter writes a long series of tweets about the same topic, they often connect multiple tweets together by replying to their own previous tweet to chain their posts together.
Remove these replies from the data.
```{r}
# Write your answer here
```
4. Now, you are going to trim down the size of the edge list.
You will do this mainly so that your computer won't freeze when it comes time to plot the network.^[
Plotting large networks can be challenging on your computer's RAM which will lead to it freezing.
We're trying to stop this behaviour.
]
Proceed in two steps:
(a) Create a data set that counts the number of times a user replies to anyone in the data.
Keep only users who have replied more than 50 times.
```{r}
# Write your answer here
```
(b) Update your the edge list so that only users who have engaged in at least 50 replies are included.
```{r}
# Write your answer here
```
5. Convert your `data.frame` containing all the edges to a tidygraph object.
```{r}
# Write your answer here
```
6. Plot the network.
Use the layout `kk` in your solution.
```{r}
# Write your answer here
```
7. Explore different layouts and find the one you think works best visually.
You could explore the choice "stress" or any of the following:
"dh", "drl", "fr", "gem", "graphopt", "lgl", "mds", "sugiyama", "bipartite", "star" or "tree".
8. The plot you produced weighted the edges by the frequency in which two nodes had replied to each other (it does this implicitly because the same edge occurred many times in the edge list).
Prevent this from happening by adjusting the edgelist to only contain distinct entries and replotting the graph.
You will have to re-run parts of you code to produce the new graph.
```{r}
# Write your answer here
```
9. You can add color to the plot that you just created.
Color (and re-size) the nodes based on their influence as measured by `centrality_authority`.
```{r}
# Write your answer here
```
10. Save the last plot you created as 'rstats-replies.pdf'.
```{r}
# Write your answer here
```
## Bonus Exercise on Plotting
The graphs you have created so far use fairly standard algorithms to plot the data.
An alternative that sometimes produces nicer looking plots uses an algorithm called `backbone`.
This exercise utilizes `backbone` to plot the data.
You will need to install an additional package:
```{r, install-backbone, eval = FALSE}
install.packages("graphlayouts")
```
And then load it into your `R` session:
```{r, load-backbone, eval = FALSE}
library(graphlayouts)
```
1. The `backbone` layout takes a `tidygraph` object and creates a layout. Run the following code to do this (don't worry too much about what this is doing in the background):
```{r, backbone-layout, eval = FALSE}
# assumes your network graph is stored as
# a tidygraph object labelled "tg"
bb <- graphlayouts::layout_as_backbone(tg, keep = 0.4)
E(tg)$col <- FALSE
E(tg)$col[bb$backbone] <- TRUE
```
```{r}
# Write your answer here
```
2. Use the `backbone` layout to create a graph as by extending the following code chunk:
```{r, backbone-plot, eval = FALSE}
# don't change the first line unless your graph
# has a different name than "tg"
ggraph(tg, layout = "manual", x = bb$xy[, 1], y = bb$xy[, 2]) +
geom_node_point(YOURCODE) +
geom_edge_link(YOURCODE) +
YOURCODE
```
```{r}
# Write your answer here
```
3. Try different values of the `keep` argument in (1) and plot the figure again. Do you notice any differences as you do?
```{r}
# Write your answer here
```
## License
This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).
## Suggested Citation
Deer, Lachlan. 2023. Social Media and Web Analytics: Lab 1 - Visualizing Social Media Networks. Tilburg University. url = "https://github.com/tisem-digital-marketing/smwa-lab-01"