-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmarkdown-in-r.Rmd
76 lines (53 loc) · 2.87 KB
/
markdown-in-r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
title: "Digital Tutorial 5: Text Analysis with R Markdown"
author: Christopher Ohge
output: html_document
toc: true
toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
chooseCRANmirror(graphics=FALSE, ind=1)
```
If you want to analyze the data of your edition, you then have to run more transformations of your data in order to optimize analysis.
### Enter R Markdown
What you're reading is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
### Why use R?
You don't have to use R. You could Python; you could use Perl. You could also just use ... Markdown (which I generally prefer).
Think about what you will want to do in your project.
Describe the problem.
Create a program will analyze the text.
Model > Transform / Analyze > Visualise.
RMarkdown will show you what is possible for digital textual editing that goes beyond simply transcribing and annotating texts. With R you can build in other unique features, such as text analysis and visualizations. Here is an example. Download the "Cranch-notebook-1839.xml" file from our GitHub repo (in the "texts" subdirectory). That is an already-published TEI-XML file of a digital diplomatic edition. But what if I wanted to visualize the most frequent words in the edition? R can quickly handle that, and produce a visualization.
```{r}
```
# R Basics
Begin by downloading R Studio (<https://www.rstudio.com/products/rstudio/>). Once it is up and running, start with a simple mathematical calculation.
```{r}
134600 + 24078 / 5
```
For us, though, we are primarily concerned with text. The most basic data structure is the vector.
Now, let's create a small collection of words called a vector. Any object that contains data is called a data structure and numeric
vectors are the simplest type of data structure in R. In fact, even a single number or word is considered a vector of length one.
The easiest way to create a vector is with the c() function, which stands for 'concatenate' or 'combine'.
```{r}
c("Call", "me", "Ishmael")
```
In R it is very important to store your data in variables, as the code will often refer back to previous functions. Let's create a variable called "md.open" for the previous vector.
```{r}
md.open <- c("Call", "me", "Ishmael")
# To print the results of the variable, simply type it out:
md.open
```
With this variable, R can do quite a bit to analyze its contents and features
```{r}
# To find the length of the vector (i.e., word count):
length(md.open)
# To calculate the character length
nchar(md.open, type = "chars")
# To create a numbered list of each element in the vector:
strsplit(md.open, "characters", "\n")
# To create a table of the words and their frequencies:
table(md.open)
```
These are only a small sample of available text analysis functions.