Skip to content

Latest commit

 

History

History
407 lines (310 loc) · 10.3 KB

textbook.md

File metadata and controls

407 lines (310 loc) · 10.3 KB
layout title
default
Course Notes
<style> .textbook-toc>ul { font-family: "Century Gothic", CenturyGothic, AppleGothic, sans-serif; font-size: 18px; font-style: normal; font-variant: small-caps; font-weight: 100; line-height: 26.4px; } .textbook-toc>h2 { font-size: 22px; color: maroon; } .textbook-toc>h4 { font-family: "Century Gothic", CenturyGothic, AppleGothic, sans-serif; font-weight: 100; font-size: 42px; color: maroon; } .textbook-toc>h1 { background-color: #666; color: white; font-size: 24px; padding: 10px; margin-top: 70px; } .uk-navbar-nav>li>a { display: flex; justify-content: center; align-items: center; box-sizing: border-box; height: 80px; padding: 0 15px; font-size: .875rem; font-family: system-ui; text-decoration: none; } iframe { display: block; margin-left: auto; margin-right: auto; } #markdown-toc ul { font-size:calc(0.85em + 0.25vw); line-height:1.2; font-weight: bold; } #markdown-toc ul li { list-style-type: disc !important; font-size:calc(0.65em + 0.25vw); line-height:1.2; margin-left: 20px; } #markdown-toc a { color: black; font-size:calc(0.65em + 0.25vw); line-height:1.2; font-weight: normal; } #markdown-toc a:hover { color: black; text-decoration: none; font-weight: bold; } body { counter-reset : h2; } h2 { counter-reset : h3; } h3 { counter-reset : h4; } h4 { counter-reset : h5; } h5 { counter-reset : h6; } article h2:before { content : counter(h2,decimal) ". "; counter-increment : h2; } article h3:before { content : counter(h2,decimal) "." counter(h3,decimal) ". "; counter-increment : h3; } article h4:before { content : counter(h2,decimal) "." counter(h3,decimal) "." counter(h4,decimal) ". "; counter-increment : h4; } article h5:before { content : counter(h2,decimal) "." counter(h3,decimal) "." counter(h4,decimal) "." counter(h5,decimal) ". "; counter-increment : h5; } article h6:before { content : counter(h2,decimal) "." counter(h3,decimal) "." counter(h4,decimal) "." counter(h5,decimal) "." counter(h6,decimal) ". "; counter-increment : h6; } h2.nocount:before, h3.nocount:before, h4.nocount:before, h5.nocount:before, h6.nocount:before { content : ""; counter-increment : none; } </style>

Data Programming for Social Scientists

The Data Science Toolkit

We will need three tools to manage your data science projects: a data programming language (R), a project management interfact (R Studio), and a way to create data-driven documents (R Markdown).

Core R [ CH-01 ]

R Studio [ CH-02 ]

  • Installing R and R Studio
  • Tour of R Studio

Data-Driven Docs [ CH-03 ]

Markdown [ CH-04 ]

Getting Started

R as a Calculator [ CH-05 ]

  • Mathematical Operators
  • Assignment
  • Objects

Functions [ CH-06 ]

  • Input-Output Devices
  • Arguments
  • Values
  • Returns

The Learning Curve [ CH-07 ]

  • Vocabular and verbs
  • Learning to Learn R

Getting Help [ CH-08 ]

  • Help files
  • Error messages
  • Discussion boards

Starting to Code

One-Dimensional Datasets

Intro to Vectors [ CH-09 ]

  • Observations vs Variables (rows vs columns)
  • Vector Types
    • Numeric
    • Character
    • Factors (ordered vs unordered)
    • Logical (true/false)
  • Checking Vector Types
  • Casting
    • Implicit Casting (coercion)

Identifying Groups within Data [ CH-10 ]

  • Set theory as categories and membership
  • Logical Operators
    • equal
    • not equal
    • greater than or less than
    • opposite of
  • Compound Statements: AND and OR
  • Casting logical vectors
  • Algebra with logical vectors
  • Defining groups
    • from categorical variables
    • from numeric variables
    • missing values as a group
  • Recoding Values
  • Find and replace

Two-Dimensional Datasets

Dataframes

  • Creating data frames from vectors
  • the $ operator
  • Checking and changing class types
  • Filter rows and select columns
  • Reorder rows or columns
  • CSV vs RDS formats

Matrices and Lists

  • Matrix
  • Lists
  • Building data objects:
  • data.frame() vs cbind() and rbind()
  • Transformations of Datasets

Data IO

Navigation

  • Navigating R (directories, paths, object lists)
  • Built-In Datasets

Getting Data into R [ data import ]

  • Read options
  • Copy and paste from Excel
  • Using rdata format
  • Read from csv or tsv
  • Read text files
  • Import from Excel
  • Import from common format (foreign package)
  • Import from the web (RCurl)
  • Import from GitHub
  • Import from DropBox
  • [ tutorial ]

Saving Data [ exporting datasets ]

  • Write options
    • CSV
    • R Data Sets (RDS)
    • CSV vs RDS
    • Tables
    • RData Format
    • SPSS or Stata
  • Copy to Clipboard
  • Copy to Excel
  • [ tutorial ]
  • What is an API?
  • Examples
    • Census
    • Socrata
    • Twitter

Data Wrangling (dplyr)

Data wrangling is the process of preparing data for analysis, which includes reading data into R from a variety of formats, cleaning data, tidying datasets, creating subsets and filters, transforming variables, grouping data, and joining multiple datasets.

The goal of data wrangling is to create a rodeo dataset (clean and well-structured) that is ready for the big show (modeling and visualization)!

Slicing Datasets – Base R and dplyr [ CH-11 ]

  • Subset operator
  • By index, including order / match
  • By logical
  • Recycling
  • Subset by row -- dplyr::filter()
  • Indices
  • Selector Vectors
  • Subset by column --- dplyr::select()

Wrangling Recipes [ CH-12 ]

  • Pipe operator
  • Window vs summary functions
  • dplyr cheat sheet

Combining Datasets [ CH-13 ]

  • merge and match
  • join in dplyr
  • inner, outer, right, left

Explore and Describe

Group Structure [ CH-14 ]

  • Combining factors and numeric data for analysis
  • Faceting in plots

Summarizing Vectors

  • Counting things: sum( logical statement )
  • Categorical data: tables
  • Missing values
  • prop.table() and margin.table()
  • Numeric data: min, max, mean, summary / quantile
  • Missing values
  • All at once: summary + data.frame / matrix
  • Creating tables of descriptives: factors vs numeric

Summarizing Groups of Vectors

Visualize

Principles of Visual Communication [ Intro to Data Viz ]

  • Ground, figure, narrative (context, subject, action)
  • Tufte’s rules
  • Visual tragedies

Core Graphics Engine [ Core ] [ Custom ]

  • Defining a canvas: xlim, ylim
  • Adding data
  • Type (point, line, both)
  • Symbols
  • Color
  • Size
  • Adding grids
  • Adding axes
  • Adding titles / axes labels
  • Adding data labels: text()
  • Margins

Advanced Graphics

  • Grammar of graphics concept
  • ggplot overview

Make Dynamic

R shiny [ overview ] [ tutorial ]

  • What makes documents dynamic?
  • Widgets
  • Render functions
  • reactive

flexdashboards [ overview ] [ demo RMD ]

  • Principles of good dashboard design
  • Layouts
  • Sidebars
  • Value boxes
  • CSS basics