-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-demo.Rmd
254 lines (186 loc) · 13.2 KB
/
01-demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
---
title: "R Notebook"
output:
html_document:
df_print: paged
---
# Preamble
The aim of this file is to show you some basic functionality of an Rmd, and some of the kinds of analysis you might do. It's not intended to be exhaustive, it's just to stimulate ideas.
# Some introductory functions
This is an export of an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you access the raw notebook, you can execute code within the notebook on your computer and the results appear beneath the code.
If you have downloaded the code, you can try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.
```{r gitsetup, eval=FALSE, message=FALSE, warning=FALSE, include=FALSE}
####This block won't be visible on the Rmd (it suppresses messages and warnings, and neither the code nor the output are visible.). To run it, change eval=FALSE to TRUE. This doesn't really need to be here, but you should run gitignore when you setup new projects.
#library(gitignore)
#gi_fetch_templates("R", append_gitignore = TRUE) #you can use this function to get common gitignore templates, particularly useful if you're using other languages. By default when r creates a git repo some of these are in it anyway.
####To this you might want to add other things like api keys, maybe data, etc. There are a range of ways to do this, for the purpose of this demo it's useful for me to create a script in a private folder that is ignored, and which creates the api keys in my environment. To demonstrate this, we'll (1) add that folder to the gitignore, (2) create it, (3) add a file to it, you'll need to change this one (or open up the file and edit it) to add your own keys. You can then (4) easily use 'source' to include that file to run it
#write_lines("/private/",".gitignore",sep = "\n", append = TRUE) #step 1
#dir.create("private") #step 2
#file.create("private/keys.R") #step 3a
#write_lines(c("",""),"private/keys.R",sep = "\n", append = TRUE) #step 3b
#source("private/keys.R") #step 4
```
## First, we're going to load libraries
Normally you'd hide the code chunk below, but in this instance we'll keep it there.
```{r load-libraries, echo=TRUE, message=FALSE, warning=FALSE}
library(tidyverse) #loads ggplot2, stringr,
#Installing tidyverse includes (but won't load as part of it) readxl, jsonlite, lubridate, httr, rvest, xml2, googlesheets4,
library(readxl)
library(jsonlite)
library(lubridate)
library(httr)
#these are also useful (in part to demonstrate features)
library(psych)
library(doBy)
library(reshape2)
library(lattice)
library(scales)
library(tidytext)
#SimComp lets me create some random datasets with defined properties
library(SimComp)
#The ones below are specifically for this data
library(dslabs)
library(trackeR)
library(trackeRapp)
#library(maptools)
#library(revgeo)
#So I can embed webpages
#install.packages("webshot")
#webshot::install_phantomjs()
library(webshot)
library(htmlwidgets)
library(widgetframe)
```
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
# Examples of analysing data in R
These are some example vignettes - worked examples - of how to run a set of analyses in R. You might find some of them useful. You can also explore the examples https://quantifiedself.com/show-and-tell/ and various other sources!
Remember, there is no expectation that you code in DSI 36100. There are many many freely available tools that will help you run the kind of basic exploratory statistical analyses you need for AT2 (e.g. spreadsheet tools), and visualise data.
There are also free tools that will analyse text and images (although possibly not in batch, unless you use an app to collect the images), again without coding. The key thing for this assignment is thinking creatively about how to analyse and make sense of the data and the justification for that, not demonstrating advanced coding skills.
Most of the examples below are from MDSI students (a couple from me, and a few externals).
## Useful other tools or snippets of code
You might also find these few posts useful:
* An Introduction to Accessing RESTful APIs Using R, by Werner Schott, March 31, 2019: In the past, working in R has meant importing data into the application from numerous sources. I have found this very manual and not very reproducible in the development of applications. One of the weaknesses I have identified and would like to address is to understand how to access online data resources using the appropriate packages in R. The scope of this exercise is not to master the use of one package or several packages, but rather understand how certain R packages fit into the overall process, and what that process is. http://rpubs.com/plantagenet/481658
* Use R-Studio on any PC (with portable apps), Joshua McCarthy, 29 March 2019, Someone out there has kindly adapted R to the Portable Apps framework! This means we can use R and R-Studio anywhere we like without needing to install R and R-Studio locally, quite handy for work, University computers and VM’s. We can also keep all our libraries and packages in one location making them easier to manage when moving between workstations. http://rpubs.com/jsmccid/rportable
* https://github.com/thomasp85/gganimate - this is very cool, create animated gif visualisations http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html - nice examples of a bunch of ggplot charts
* Create sonified data (using audio to represent the shape of data, as we use visualisation) http://playitbyr.org/
* https://cran.r-project.org/web/packages/googleway/vignettes/googleway-vignette.html - connect to a variety of google map based services
* Want to visualise your web history? Here’s one way to do that https://www.webhistorian.org/education/
* Some time ago I saw a blog post on visualising your twitter history using streamgraphs in R. As my tweets moved from mostly psychology/philosophy tweeting around my teaching, through to my current mish-mash of learning analytics stuff, I thought it’d be interesting to play with this. So, here’s the code reproduced, plus my streamgraph. http://sjgknight.com/finding-knowledge/2016/01/visualising-twitter-history/
Remember, not all of these approaches are _good_ or _sensible_ approaches, consider how you use standard features like tables and reporting numbers, and don't go overboard with complex visualisations.
## gganimate
```{r ggan_example, out.width = "100%"}
if(!file.exists("vignettes/gganimate.html")){
file.copy(RShowDoc("gganimate", package = "gganimate"), "vignettes/", overwrite = FALSE)
#file.copy(vignette("gganimate"), "vignettes/", overwrite = FALSE)
}
knitr::include_url("vignettes/gganimate.html")
```
## Garmin
```{r garmin, out.width = "100%"}
#uses the library trackeR. See vignette at https://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html - read for detail
#First we're going to load some dummy data.
#to do this with your own data, you'll need to use the function read_container
#TO DO THAT YOU SHOULD UNCOMMENT THE FOLLOWING TWO LINES
#BUT FOR THE PURPOSES OF THIS DEMO I'M JUST GOING TO IMPORT THE PACKAGE DATA
#filepath <- "private/your_Garmin_export.TCX.gz"
#runDF <- read_container(filepath, type = "tcx", timezone = "GMT") #check other options
##################################################
##################################################
#knitr::include_url("http://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html")
#Because I actually have this file locally as part of the package I'm going to save a copy into a new directory, and then display that version
if(!file.exists("vignettes/TourDetrackeR")){
file.copy(RShowDoc("TourDetrackeR", package = "trackeR"), "vignettes/", overwrite = FALSE)
}
knitr::include_url("vignettes/TourDetrackeR.html")
```
## Walking data (and covid impacts)
```{r walking}
knitr::include_url("https://methodmatters.github.io/impact-covid-19-pandemic-2020-steps/")
```
## Your location
```{r location}
#This one is from this markdown https://rpubs.com/Geoff_W/481405 by Geoff, an alumnus of the MDSI
if(!file.exists("vignettes/location.html")){
download.file("https://rstudio-pubs-static.s3.amazonaws.com/481405_ff86c5e272d84cc6a47c3d4b3c0fb0c3.html", "vignettes/location.html")
}
knitr::include_url("vignettes/location.html")
```
## Text mining demo
Show some functions basic text stuff.
```{r text mining}
knitr::include_url("https://rafalab.github.io/dsbook/text-mining.html")
if(!file.exists("vignettes/tidytext")){
file.copy(RShowDoc("tidytext", package = "tidytext"), "vignettes/", overwrite = FALSE)
}
knitr::include_url("vignettes/tidytext.html")
#knitr::include_url("https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html")
```
## Sentiment on text
This is a nicer sentiment analysis package
```{r sentiment}
#knitr::include_url("https://github.com/trinker/sentimentr/blob/master/README.md")
if(!file.exists("vignettes/sentimentr.html")){
download.file("https://github.com/trinker/sentimentr/blob/master/README.md", "vignettes/sentimentr.html")
}
knitr::include_url("vignettes/sentimentr.html")
```
## Named entity recognition
Can we identify things (people, places, etc.) in text?
```{r NER}
#I dn't have storng views on whether this package is the best (the vignette could be expanded a bit...), but it is a tidy approach to NLP and has a nice entity extraction (see entity_type and entity)
knitr::include_url("https://statsmaths.github.io/cleanNLP/state-of-union.html")
#another simple option is entity
#knitr::include_url("https://github.com/trinker/entity/blob/master/README.md")
if(!file.exists("vignettes/NER.html")){
download.file("https://github.com/trinker/entity/blob/master/README.md", "vignettes/NER.html")
}
knitr::include_url("vignettes/NER.html")
#https://towardsdatascience.com/quick-guide-to-entity-recognition-and-geocoding-with-r-c0a915932895
#Notice they also geocode their data using https://cran.r-project.org/web/packages/tidygeocoder/vignettes/tidygeocoder.html
```
## Image processing Images
Here's a silly example of some functions for dealing with image data, including image recognition
```{r images}
#knitr::include_url("https://rpubs.com/sjgknight/food")
knitr::include_url("vignettes/clarifai.html")
```
## EDA and Descriptive Statistics
Show some...
```{r EDA}
```
## QS_Ledger (a set of Python modules)
The repo at https://github.com/markwk/qs_ledger is a set of Python scripts to get data as below. I've included this as a `submodule` in git, in the vignettes directory. At time of writing this included:
* Apple Health: fitness and health tracking, data analysis and dashboard from iPhone or Apple Watch (includes example of Elastic Search integration and Kibana Health Dashboard).
* AutoSleep: iOS sleep tracking data analysis of sleep per night and rolling averages.
* Fitbit: fitness and health tracking and analysis of Steps, Sleep, and Heart Rate from a Fitbit wearable.
* GoodReads: book reading tracking and data analysis for GoodReads.
* Google Calendar: past events, meetings and times for Google Calendar.
* Google Sheets: get data from any Google Sheet which can be useful for pulling data from IFTTT integrations that add data.
* Habitica: habit and task tracking with Habitica's gamified approach to task management.
* Instapaper: articles read and highlighted passages from Instapaper.
* Kindle Highlights: Parser and Highlight Extract from Kindle clippings, along with a sample data analysis and tool to export highlights to separate markdown files.
* Last.fm: music tracking and analysis of music listening history from Last.fm.
* Oura: oura ring activity, sleep and wellness data.
* RescueTime: track computer usage and analysis of computer activities and time with RescueTime.
* Pocket: articles read and read count from Pocket.
* Strava: activities downloader (runs, cycling, swimming, etc.) and analysis from Strava.
* Todoist: task tracking and analysis of todo's and tasks completed history from Todoist app.
* Toggl: time tracking and analysis of manual timelog entries from Toggl.
* WordCounter: extract wordcounter app history and visualize recent periods of word counts.
## Social media analysis
There are lots of tools for analysing social media data, including ones that do not require any coding skill.
For facebook data two easy approaches are:
* you can extract ad data https://github.com/RitwikGA/FacebookReportingTool
* analyse your 'likes' for personality https://applymagicsauce.com/demo
This post [describes working with your Facebook download data in R](http://www.deeplytrivial.com/2018/06/working-with-your-facebook-data-in-r.html?m=1)
## Other fun examples
### search history
```{r search}
if(!file.exists("vignettes/search.html")){
download.file("https://rstudio-pubs-static.s3.amazonaws.com/359000_436aec1332894fe8b4651e1aab587d12.html", "vignettes/search.html")
}
knitr::include_url("vignettes/search.html")
```
...your example here!