layout | title |
---|---|
default |
Course Notes |
We will need three tools to manage your data science projects: a data programming language (R), a project management interfact (R Studio), and a way to create data-driven documents (R Markdown).
Core R [ CH-01 ]
- What is R? [ video ]
- Packages
R Studio [ CH-02 ]
- Installing R and R Studio
- Tour of R Studio
Data-Driven Docs [ CH-03 ]
- Automation & Flexibility
- The Importance of Reproducibility
- Formats link
- Gallery link
Markdown [ CH-04 ]
- R Markdown Formats overview
- Headers and Chunks link
- Knitting link
- Customization
R as a Calculator [ CH-05 ]
- Mathematical Operators
- Assignment
- Objects
Functions [ CH-06 ]
- Input-Output Devices
- Arguments
- Values
- Returns
The Learning Curve [ CH-07 ]
- Vocabular and verbs
- Learning to Learn R
Getting Help [ CH-08 ]
- Help files
- Error messages
- Discussion boards
Intro to Vectors [ CH-09 ]
- Observations vs Variables (rows vs columns)
- Vector Types
- Numeric
- Character
- Factors (ordered vs unordered)
- Logical (true/false)
- Checking Vector Types
- Casting
- Implicit Casting (coercion)
Identifying Groups within Data [ CH-10 ]
- Set theory as categories and membership
- Logical Operators
- equal
- not equal
- greater than or less than
- opposite of
- Compound Statements: AND and OR
- Casting logical vectors
- Algebra with logical vectors
- Defining groups
- from categorical variables
- from numeric variables
- missing values as a group
- Recoding Values
- Find and replace
- Creating data frames from vectors
- the $ operator
- Checking and changing class types
- Filter rows and select columns
- Reorder rows or columns
- CSV vs RDS formats
- Matrix
- Lists
- Building data objects:
- data.frame() vs cbind() and rbind()
- Transformations of Datasets
- Navigating R (directories, paths, object lists)
- Built-In Datasets
Getting Data into R [ data import ]
- Read options
- Copy and paste from Excel
- Using rdata format
- Read from csv or tsv
- Read text files
- Import from Excel
- Import from common format (foreign package)
- Import from the web (RCurl)
- Import from GitHub
- Import from DropBox
- [ tutorial ]
Saving Data [ exporting datasets ]
- Write options
- CSV
- R Data Sets (RDS)
- CSV vs RDS
- Tables
- RData Format
- SPSS or Stata
- Copy to Clipboard
- Copy to Excel
- [ tutorial ]
APIs [ using APIs in R ] [ Demo with DataUSA API ]
- What is an API?
- Examples
- Census
- Socrata
Data wrangling is the process of preparing data for analysis, which includes reading data into R from a variety of formats, cleaning data, tidying datasets, creating subsets and filters, transforming variables, grouping data, and joining multiple datasets.
The goal of data wrangling is to create a rodeo dataset (clean and well-structured) that is ready for the big show (modeling and visualization)!
Slicing Datasets – Base R and dplyr [ CH-11 ]
- Subset operator
- By index, including order / match
- By logical
- Recycling
- Subset by row -- dplyr::filter()
- Indices
- Selector Vectors
- Subset by column --- dplyr::select()
Wrangling Recipes [ CH-12 ]
- Pipe operator
- Window vs summary functions
- dplyr cheat sheet
Combining Datasets [ CH-13 ]
- merge and match
- join in dplyr
- inner, outer, right, left
Group Structure [ CH-14 ]
- Combining factors and numeric data for analysis
- Faceting in plots
- Counting things: sum( logical statement )
- Categorical data: tables
- Missing values
- prop.table() and margin.table()
- Numeric data: min, max, mean, summary / quantile
- Missing values
- All at once: summary + data.frame / matrix
- Creating tables of descriptives: factors vs numeric
- Table ( f1, f2 ), ftable( row.vars=c(“f1”,”f2”), col.vars=”f3” )
- Function over groups: tapply( v1, f1 ) or dplyr:: group_by() + summarise()
- Functions over levels of numeric data: tapply( v1, cut(v2) )
- tapply( v1, INDEX=list(f1,f2) or dplyr:: group_by() + summarise()
- aggregate( dat, FUN, by=f1 )
- https://cran.r-project.org/web/packages/DescTools/vignettes/DescToolsCompanion.pdf
Principles of Visual Communication [ Intro to Data Viz ]
- Ground, figure, narrative (context, subject, action)
- Tufte’s rules
- Visual tragedies
- Defining a canvas: xlim, ylim
- Adding data
- Type (point, line, both)
- Symbols
- Color
- Size
- Adding grids
- Adding axes
- Adding titles / axes labels
- Adding data labels: text()
- Margins
- Colors and color functions
- Custom fonts / math symbols
- Multiple Plots (core graphics)
- Custom graph layouts
ggplot2 [ Intro to the Grammar of Graphics ]
- Grammar of graphics concept
- ggplot overview
- What makes documents dynamic?
- Widgets
- input objects
- Widgets Gallery
- Render functions
- reactive
- Principles of good dashboard design
- Layouts
- Sidebars
- Value boxes
- CSS basics