Skip to content

Latest commit

 

History

History
59 lines (52 loc) · 2.44 KB

curriculum.md

File metadata and controls

59 lines (52 loc) · 2.44 KB

Curriculum

The broad curriculum elements for the workshop are listed below. We would showcase some of them in this workshop.

  1. Introduction - “I think, therefore I am”
  • What is data analysis?
  • What type of questions can be answered?
  • Frame/Acquire/Refine/Explore/Model/Insight framework
  1. Acquire - "Data is the new oil"
  • Sources of Data - Download from an internal system, Obtained from client, or other 3rd party, Extracted from a web-based API, Scraped from a website / pdfs, or Gathered manually and recorded
  • Acquire data from a csv file or a database
  • Acquire data from a 3rd part client (e.g. twitter)
  1. Refine - "Data is messy"
  • Concept of Tidy Data - Why is it important?
  • Missing e.g. Check for missing or incomplete data
  • Quality e.g. Check for duplicates, accuracy, unusual data
  • Parse e.g. extract year from date
  • Merge e.g. first and surname for full name
  • Convert e.g. free text to coded value
  • Derive e.g. gender from title
  • Calculate e.g. percentages, proportion
  • Remove e.g. remove redundant data
  • Aggregate e.g. rollup by year, cluster by area
  • Filter e.g. exclude based on location
  • Sample e.g. extract a representative data
  • Summary e.g. show summary stats like mean
  • Basic statistics: variance, standard deviation, co-variance, correlation
  1. Explore - "I don't know, what I don't know"
  • Why do visual exploration?
  • Understand Data Structure & Types
  • Explore single variable graphs - Quantitative, Categorical
  • Explore dual variable graphs - Q & Q, Q & C, C & C
  • Explore multi-dimensional variable graphs
  1. Model - "All models are wrong, Some of them are useful"
  • Introduction to Machine Learning
  • The power and limits of models
  • Tradeoff between Prediction Accuracy and Model Interpretability
  • Assessing Model Accuracy
    • For Regression problems - RMSE
    • For classification problems- Precision, Recall, AUC/ROC, F-Score, Mis-classification rate
  • Bias-Variance tradeoff
  • Overfitting
  • Linear Regression
  • Logistic Regression
  • L1, L2 Linear & Logistic Regression
  • Regularization
  • Classification model
  • Decision Trees
  • Visualizing decision trees
  1. Insight - “The goal is to turn data into insight”
  • Why do we need to communicate insight?
  • Types of communication - Exploration vs. Explanation
  • Explanation: Telling a story with data
  • Exploration: Building an interface for people to find stories