PySpark and Thunder Library implementation, uses k-means clustering. Ref: Thunder is a modular collection of tools for the analysis of image and time series data in Python. Supports fast and interactive analysis of small, medium, or very large datasets. Runs locally or against a Spark cluster with an identical API.