Skip to content

Latest commit

 

History

History

5-exploratory_data_analysis

Exploratory Data Analysis

When working with large and high-dimensional datasets it can be difficult to get insight into the data. There are a range of techniques and algorithms that can assist in this process, many of which are classified as "unsupervised" data analysis algorithms (the data has features/inputs but no labeled outputs). This module will explore a few of these approaches and show how they can be used to visualize and analyze complex datasets.

Recommended Reading:

  • "Machine Learning Refined" Ch. 9
  • "Elements of Statistical Learning" Sec. 13.2, 13.3, Ch. 14

Associated Notebooks:

Lectures

  • High Dimensional Data - Introduction to high-dimensional data, inspecting and visualizing features.
  • Dimensionality Reduction - Performance assessment, principal component analysis, and manifold learning.
  • Clustering - Expectation-Maximization models (k-means), density based models (mean shift), and hierarchical models.
  • Generative Models - Intro to generative models, Gaussian mixture models, kernel density estimation, and not-so-naïve Bayes.