During November 2014 when I took the course for about third time, I was finally able to even start working on the Course Project.
You should create one R script called run_analysis.R that does the following.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set.
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
reshape2
data.table
The R script installs the package itself when needed.
- train data in a
train
folder - test data in a
test
folder activity_labels.txt
features.txt
final.txt
the outcome of the processreadme.md
short info about everythingCodeBook.md
description of the data used for this project
###What does the scrip do?
Every step the run_analysis.R
script does is commented already in the code.
-
Reads all the data and merges the training and the test sets to create one data set using
read.table
andrbind
andcbind
. Then it renames the columns properly -
Keeps only those columns that have "std" or "mean" in their names (and the test.ID and activity.code, too). Using
grep
command, I was able to index colums I wanted to keep in the dataset, and then subset the dataset with these indeces. -
Uses descriptive activity names to name the activities in the data set
-
Changes "tBodyAcc-mean()-X" to something that is easy to read.
Usinggsub
command, I have changed:- ^t to Time
- ^f to Frequency
- -mean to Mean
- -std to StdDev
-
Melts the data by Subject's ID and the Activity in order to get the average value for each varible "separated" by Subject's ID and the activity they were performing. Finally it writes the data to a tidy dataset called
final.txt
.