Skip to content

JSC370/JSC370-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JSC370: Data Science II (Winter 2025), University of Toronto

Where and When

Weekly Course Schedule

Topics/Weekly Activities Labs Due Wednesdays 11:59pm
HW Due Fridays 11:59pm
Week 1
January 6 lecture
January 8 lab
Introduction to Data Science tools: R, markdown Lab 1
Week 2
January 13 lecture
January 15 lab

Version Control & Reproducible Research, Git
Lab 2
Week 3
January 20 lecture
January 22 lab
Exploratory Data Analysis Lab 3
Week 4
January 27 lecture
January 29 lab
Data visualization HW1, Lab 4
Week 5
February 3 lecture
February 5 lab
Data cleaning and wrangling
ML 1 (gam)
Lab 5
Week 6
February 10 lecture
February 12 lab
Regular Expressions, Data scraping, using APIs HW2, Lab 6
Week 7
February 17
Reading Week
Week 8
February 24 lecture
February 26 lab
Text mining HW3, Lab 8
Week 9
March 3 lecture
March 5 lab
High performance computing, cloud computing Midterm, Lab 9
Week 10
March 10 lecture
March 12 lab
ML 2 (trees, rf, xgboost) Lab 10
Week 11
March 17 lecture
March 19 lab11

Interactive visualization and effective data communication I
HW4, Lab 11
Week 12
March 23 lecture
March 26 lab12
Interactive visualization and effective data communication II Lab 12
Week 13
March 31
April 2
Final Project Workshop HW5
Week 15
April 28
Final Project

Grading Breakdown

Task % of Grade
Labs (including attendance) 10
Homework (5) 25
Midterm report 30
Final project 35

Readings

Resources

Markdown

Helpers and Templates

  • RStudio Cheatsheets Other quick guides, including a more comprehensive RMarkdown reference and a information about using RStudio's IDE, and some of the main tools in R.

Guides

Tools

  • Apple's Developer Tools Unix toolchain. Install directly with xcode-select --install, or just try to use e.g. git from the terminal and have OS X prompt you to install the tools.
  • Homebrew package manager. A convenient way to install several of the tools here, including Emacs and Pandoc.
  • R. A platform for statistical computing.
  • knitr. Reproducible plain-text documents from within R.
  • Python and SciPy. Python is a general-purpose programming language increasingly used in data manipulation and analysis.
  • RStudio. An IDE for R. The most straightforward way to get into using R and RMarkdown.
  • TeX and LaTeX. A typesetting and document preparation system. You can write files in .tex format directly, but it is more useful to just have it available in the background for other tools to use. The MacTeX Distribution is the one to install for macOS.
  • Pandoc. Converts plain-text documents to and from a wide variety of formats. Can be installed with Homebrew. Be sure to also install pandoc-citeproc for processing citations and bibliographies, and pandoc-crossref for producing cross-references and labels.
  • Git. Version control system. Installs with Apple's Developer Tools, or get the latest version via Homebrew.
  • GNU Make. You tell make what the steps are to create the pieces of a document or program. As you edit and change the various pieces, it automatically figures out which pieces need to be updated and recompiled, and issues the commands to do that. See Karl Broman's Minimal Make for a short introduction. Make will be installed automatically with Apple's developer tools.
  • lintr and flycheck. Tools that nudge you to write neater code.

Other Applications and Services

  • Backblaze. Secure off-site backup.
  • GitHub. Host public Git repositories for free. Pay to host private ones. Also a source for publicly available code (e.g. R packages and utilities) written by other people.
  • Marked 2. Live HTML previewing of Markdown documents. Mac OS X only.
  • Sublime Text. Python-based text editor.
  • Zotero, Mendeley, and Papers are citation managers that incorporate PDF storage, annotation and other features. Zotero is free to use. Mendeley has a premium tier. Papers is a paid application after a trial period. I don't use these tools much, but that's not for any strong principled reason---mostly just intertia. If you use one and want to integrate with the material here, just make sure it can export to BibTeX/BibLaTeX files. Papers, which I've used most recently, can handily output citation keys in pandoc's format amongst several others.

Data

Many of these websites have API to download the data. We recommend you using APIs to get data.

Canadian Data

Environmental Data

International Data

US Data

Health and Biological Data

Social Networks

Academic Publications and Related

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages