Skip to content

Files

Latest commit

b9a486b · Dec 22, 2016

History

History
133 lines (109 loc) · 6.12 KB

File metadata and controls

133 lines (109 loc) · 6.12 KB

CodeBook

Data Source

A full description is available at the site where the data was obtained: Human Activity Recognition Using Smartphones Data Set

The data for the project: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Data Information

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

Feature Selection

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.

  • tBodyAccMeanX
  • tBodyAccMeanY
  • tBodyAccMeanZ
  • tBodyAccStdX
  • tBodyAccStdY
  • tBodyAccStdZ
  • tGravityAccMeanX
  • tGravityAccMeanY
  • tGravityAccMeanZ
  • tGravityAccStdX
  • tGravityAccStdY
  • tGravityAccStdZ
  • tBodyAccJerkMeanX
  • tBodyAccJerkMeanY
  • tBodyAccJerkMeanZ
  • tBodyAccJerkStdX
  • tBodyAccJerkStdY
  • tBodyAccJerkStdZ
  • tBodyGyroMeanX
  • tBodyGyroMeanY
  • tBodyGyroMeanZ
  • tBodyGyroStdX
  • tBodyGyroStdY
  • tBodyGyroStdZ
  • tBodyGyroJerkMeanX
  • tBodyGyroJerkMeanY
  • tBodyGyroJerkMeanZ
  • tBodyGyroJerkStdX
  • tBodyGyroJerkStdY
  • tBodyGyroJerkStdZ
  • tBodyAccMagMean
  • tBodyAccMagStd
  • tGravityAccMagMean
  • tGravityAccMagStd
  • tBodyAccJerkMagMean
  • tBodyAccJerkMagStd
  • tBodyGyroMagMean
  • tBodyGyroMagStd
  • tBodyGyroJerkMagMean
  • tBodyGyroJerkMagStd
  • fBodyAccMeanX
  • fBodyAccMeanY
  • fBodyAccMeanZ
  • fBodyAccStdX
  • fBodyAccStdY
  • fBodyAccStdZ
  • fBodyAccMeanFreqX
  • fBodyAccMeanFreqY
  • fBodyAccMeanFreqZ
  • fBodyAccJerkMeanX
  • fBodyAccJerkMeanY
  • fBodyAccJerkMeanZ
  • fBodyAccJerkStdX
  • fBodyAccJerkStdY
  • fBodyAccJerkStdZ
  • fBodyAccJerkMeanFreqX
  • fBodyAccJerkMeanFreqY
  • fBodyAccJerkMeanFreqZ
  • fBodyGyroMeanX
  • fBodyGyroMeanY
  • fBodyGyroMeanZ
  • fBodyGyroStdX
  • fBodyGyroStdY
  • fBodyGyroStdZ
  • fBodyGyroMeanFreqX
  • fBodyGyroMeanFreqY
  • fBodyGyroMeanFreqZ
  • fBodyAccMagMean
  • fBodyAccMagStd
  • fBodyAccMagMeanFreq
  • fBodyBodyAccJerkMagMean
  • fBodyBodyAccJerkMagStd
  • fBodyBodyAccJerkMagMeanFreq
  • fBodyBodyGyroMagMean
  • fBodyBodyGyroMagStd
  • fBodyBodyGyroMagMeanFreq
  • fBodyBodyGyroJerkMagMean
  • fBodyBodyGyroJerkMagStd
  • fBodyBodyGyroJerkMagMeanFreq

Note : features are normalized and bounded within [-1,1].

Activity Labels

  • WALKING (value 1): subject was walking during the test
  • WALKING_UPSTAIRS (value 2): subject was walking up a staircase during the test
  • WALKING_DOWNSTAIRS (value 3): subject was walking down a staircase during the test
  • SITTING (value 4): subject was sitting during the test
  • STANDING (value 5): subject was standing during the test
  • LAYING (value 6): subject was laying down during the test

Transformations done on Data

The script run_analysis.R performs the 5 steps transformation from raw data to tidy data.

  1. Read X_train.txt, y_train.txt and subject_train.txt from the "./data/train" folder and store them in x_train, y_train & subject_train respectively. Similarly read X_test.txt, y_test.txt and subject_test.txt test files from "./data/test/" folder and store them in x_test, y_test & subject_test respectively.
  2. Combine the respective train and test files and store them in respective x_data, y_data & subject_data.
  3. Read the features and store them separately & extract only the measurements on the mean and standard deviation for each measurement and store them in mean_std_measure.
  4. With the help of mean_std_measure subset x_data and add column names to it.
  5. Read the activities label and add descriptive activity names to name the activities in the data set.
  6. Add column names to y_data and subject_data & Combine all the datasets to form a table.
  7. Calculate the average of each variable for each activity and each subject and write it to tidydata.txt.