Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here is my work! #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
49 changes: 27 additions & 22 deletions Class 7 Instructions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@ date: "February 13, 2016"

##Install packages for manipulating data
We will use two packages: tidyr and dplyr
```{r}
```{r, eval=FALSE}
#Insall packages
install.packages("tidyr", "dplyr")
#Load packages
library(tidyr, dplyr)
```

##Upload wide format instructor data (instructor_activity_wide.csv)
```{r}
data_wide <- read.table("~/Documents/NYU/EDCT2550/Assignments/Assignment 3/instructor_activity_wide.csv", sep = ",", header = TRUE)
```{r, eval=FALSE}
data_wide <- read.table("instructor_activity_wide.csv", sep = ",", header = TRUE)

#Now view the data you have uploaded and notice how its structure: each variable is a date and each row is a type of measure.
View(data_wide)
Expand All @@ -37,7 +37,7 @@ The gather command requires the following input arguments:
- value: Name of new value column
- ...: Names of source columns that contain values

```{r}
```{r, eval=FALSE}
data_long <- gather(data_wide, date, variables)
#Rename the variables so we don't get confused about what is what!
names(data_long) <- c("variables", "date", "measure")
Expand All @@ -52,61 +52,63 @@ The spread function requires the following input:
- key: Name of column containing the new column names
- value: Name of column containing values

```{r}
```{r, eval=FALSE}
instructor_data <- spread(data_long, variables, measure)
```

##Now we have a workable instructor data set!The next step is to create a workable student data set. Upload the data "student_activity.csv". View your file once you have uploaded it and then draw on a piece of paper the structure that you want before you attempt to code it. Write the code you use in the chunk below. (Hint: you can do it in one step)

```{r}

```{r, eval=FALSE}
student_data <- read.table("student_activity.csv", sep = ",", header = TRUE)
student_data <- spread(student_data, variable, measure)
```


##Now that you have workable student data set, subset it to create a data set that only includes data from the second class.

To do this we will use the dplyr package (We will need to call dplyr in the command by writing dplyr:: because dplyr uses commands that exist in other packages but to do different operations.)

Notice that the way we subset is with a logical rule, in this case date == 20160204. In R, when we want to say that something "equals" something else we need to use a double equals sign "==". (A single equals sign means the same as <-).

```{r}
```{r, eval=FALSE}
student_data_2 <- dplyr::filter(student_data, date == 20160204)
```

Now subset the student_activity data frame to create a data frame that only includes students who have sat at table 4. Write your code in the following chunk:

```{r}

```{r, eval=FALSE}
student_data_3 <- dplyr::filter(student_data, table == 4)
```

##Make a new variable

It is useful to be able to make new variables for analysis. We can either apend a new variable to our dataframe or we can replace some variables with a new variable. Below we will use the "mutate" function to create a new variable "total_sleep" from the light and deep sleep variables in the instructor data.

```{r}
```{r, eval=FALSE}
instructor_data <- dplyr::mutate(instructor_data, total_sleep = s_deep + s_light)
```

Now, refering to the cheat sheet, create a data frame called "instructor_sleep" that contains ONLY the total_sleep variable. Write your code in the following code chunk:

```{r}

```{r, eval=FALSE}
instructor_sleep <- dplyr::select(instructor_data, total_sleep)
```

Now, we can combine several commands together to create a new variable that contains a grouping. The following code creates a weekly grouping variable called "week" in the instructor data set:

```{r}
```{r, eval=FALSE}
instructor_data <- dplyr::mutate(instructor_data, week = dplyr::ntile(date, 3))
```

Create the same variables for the student data frame, write your code in the code chunk below:
```{r}

```{r, eval=FALSE}
student_data <- dplyr::mutate(student_data, week = dplyr::ntile(date, 3))
```

##Sumaraizing
Next we will summarize the student data. First we can simply take an average of one of our student variables such as motivation:

```{r}
```{r, eval=FALSE}
student_data %>% dplyr::summarise(mean(motivation))

#That isn't super interesting, so let's break it down by week:
Expand All @@ -116,22 +118,25 @@ student_data %>% dplyr::group_by(date) %>% dplyr::summarise(mean(motivation))

Create two new data sets using this method. One that sumarizes average motivation for students for each week (student_week) and another than sumarizes "m_active_time" for the instructor per week (instructor_week). Write your code in the following chunk:

```{r}

```{r, eval=FALSE}
student_week <- student_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(motivation))
instructor_week <- instructor_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(m_active_time))
```

##Merging
Now we will merge these two data frames using dplyr.

```{r}
```{r,eval=FALSE }
merge <- dplyr::full_join(instructor_week, student_week, "week")
```

##Visualize
Visualize the relationship between these two variables (mean motivation and mean instructor activity) with the "plot" command and then run a Pearson correlation test (hint: cor.test()). Write the code for the these commands below:

```{r}

```{r, eval=FALSE}
names(merge)<-c("week", "student_avg", "instructor_avg")
plot(merge$student_avg, merge$instructor_avg)
cor.test(merge$student_avg, merge$instructor_avg)
```

Fnally save your markdown document and your plot to this folder and comit, push and pull your repo to submit.
13 changes: 13 additions & 0 deletions Class7.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
249 changes: 249 additions & 0 deletions Class_7_Instructions.html

Large diffs are not rendered by default.