When working with large and high-dimensional datasets it can be difficult to get insight into the data. There are a range of techniques and algorithms that can assist in this process, many of which are classified as "unsupervised" data analysis algorithms (the data has features/inputs but no labeled outputs). This module will explore a few of these approaches and show how they can be used to visualize and analyze complex datasets.
- "Machine Learning Refined" Ch. 9
- "Elements of Statistical Learning" Sec. 13.2, 13.3, Ch. 14
- High Dimensional Data - Introduction to high-dimensional data, inspecting and visualizing features.
- Dimensionality Reduction - Performance assessment, principal component analysis, and manifold learning.
- Clustering - Expectation-Maximization models (k-means), density based models (mean shift), and hierarchical models.
- Generative Models - Intro to generative models, Gaussian mixture models, kernel density estimation, and not-so-naïve Bayes.