-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
102 lines (70 loc) · 6.28 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: 'Introduction to Data Science'
output:
html_document:
toc: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, message=FALSE, warning=F)
options(scipen=100)
options(digits=3)
if (!require("pacman")) {install.packages("pacman"); library(pacman)}
# special installation process for printr as it is not available on CRAN
pacman::p_load(tidyverse, knitr, readr, pander)
panderOptions("digits", 2)
panderOptions('round', 2)
panderOptions('keep.trailing.zeros', TRUE)
panderOptions('keep.line.breaks', TRUE)
```
- Course Website: https://cities.github.io/datascience2017
- GitHub repository: https://github.com/cities/datascience2017
# Syllabus
Did you ever feel you are “drinking from a hose” with the amount of data you are attempting to analyze? Have you been frustrated with the tedious steps in your data processing and analysis process and thinking there gotta be a better way to do things? Are you curious what the buzz of data science is about? If any of your answers is yes, then this course is for you. Although computing is now an integral part of every aspect of science and engineering, transportation research included, most students of science, engineering, and planning are never taught how to build, use, validate, and share software well. As a result, many spend hours or days doing things badly that could be done well in just a few minutes. The goal of this course is to start changing that so that the students can spend less time wrestling with software and more time doing useful research. Building on the successful data science training programs, such as the Software Carpentry (http://www.software-carpentry.org/) and Data Carpentry, and recent development of related software and research, this course exposes students in transportation research and practice to the best practices in scientific computing through hands-on lab sessions and aims to help students tackle the challenge of "drinking from a hose" when dealing with overwhelming amount of data that is increasingly common in transportation research and practice.
## Topics
The table below shows by date lecture topics, computer labs, and readings, and dates that assignments will be handed out and due. Supplement readings will be posted on course website. Topics are subject to adjustment according to the need of students.
### Part I: Data Science Best Practices
1. [Overview and Introduction](01-part1-overview.html)
1. [R Coding Basics](02-coding.html)
1. [Version Control with Git](03-git.html)
1. [Creating R Package](04-package.html)
1. [Producing Reports With R Markdown](05-rmarkdown.html)
1. [Part I Wrap up](http://cities.github.io/datascience/resources.html)
### Part II: tidyverse workflow
1. [Overview of tidyverse and workflow](06-part2-overview.html)
1. [Data importing](07-import.html)
1. [Tidy data](08-tidy.html)
1. [Data manipulation with dplyr](09-dplyr.html)
1. [Data visulization with ggplot2](10-visualize.html)
1. [Data modeling with Split-Apply-Combine](11-split-apply-combine.html)
1. Part II Wrap Up
## Location and Time
- Room: URBN 303
- Time:
- Part I 8/22 - 25, 2017 1 - 5pm
- Part II 8/28 - 9/1, 2017 1 - 5pm
## Prerequisite
Basic knowledge and experience of conduct scientific research with quantitative information; skill of using (or keen to learn) a programming language and/or data processing and statistical software (such as python, R, SPSS, Stata).
## Format
Classes will all be hands-on sessions with lecture, discussions and labs. Readings drawn from books, articles, and online resources will be assigned. Students are expected to read them before class and to participate in class discussions. A major component of the class is the class project in which students go through the process of data retrieval, processing, conducting analysis, and developing a report/article while learning the best practices of data science.
## Software and Hardware
This course will use R, the free statistical software, and RStudio (https://www.rstudio.com/) as our main interface to R. The lecture and lab instructions will be provided using R. It is possible (and encouraged) for existing Python users (and potentially other software, such as SPSS, Stata, Matlab, SAS, etc) to keep using the software they already know well. Student must bring their own laptop. The instructor and TA will help the students set up their laptop to run all examples/exercises. They can review/re-run the examples in lectures and labs by themselves.
## Textbook and Readings
The course will use the following textbook:
- [Wickham, H., Grolemund, G., 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, 1 edition. ed. O’Reilly Media](https://www.amazon.com/Data-Science-Transform-Visualize-Model/dp/1491910399).
An electronic version is available on [Hadley Wickham's website](http://r4ds.had.co.nz/).
For Python users, Wes McKinney’s book is recommended:
- [McKinney, W., 2012. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, 1 edition. ed. O’Reilly Media](https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793).
Journal articles and online resources are used as supplements to the textbook.
# Acknowledgements
This course is developed with support from National Institute of Transportation and Communities project #854.
<div style="width:300px">
[![NITC](http://nitc.trec.pdx.edu/sites/default/files/nitc_4c_horiz_tag.jpg)](http://nitc.trec.pdx.edu/)
</div>
Parts of the course materials have been adpated from the following sources:
- [R for Data Science](https://github.com/hadley/r4ds) by Hadley Wickham
- [Software Carpentry workshop lessons](https://software-carpentry.org/lessons/)
- [UBC Stat 545](http://stat545.com) by Professor Jenny Bryan at UBC
- [NEU 5110 Introduction to Data Science](http://janvitek.org/events/NEU/5110/) by Professor Jan Vitek
The writing-up and website is powered by the [bookdown](https://bookdown.org) package and [github](https://www.github.com).
# License
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.