Welcome to Spark labs bundle. This is the Scala track.
Create a lab bundle as follows
$ ./assemble-labs.sh
Instructor will provide lab bundle
The VMs already have data loaded. This for your own reference.
Link to Full Dataset (Note: Large download, ~300 Meg)
- Click the above link to download or
- use
wget
from command line
$ wget "https://s3.amazonaws.com/elephantscale-public/data/data.zip"
- 1.1 - Scala shell
- [1.2 - Functions]
- Setup 1 - Instructor to demo first
- 1.3 - File IO
- 1.4 - Higher Order Functions
- 1.5 - Vending Machine class
- 1.6 - Unit testing with SPECS2
- 2.1 - Install and run Spark
- 2.2 - Spark Shell - Standalone || Hadoop version
- 4.1 - Dataframes
- 4.2 - Spark SQL
- 4.3 - Dataset
- 4.4 - Caching 2 SQL
- 4.5 - Spark & Hive (Hadoop)
- 4.6 - Data formats
This is the new recommended API for streaming.
- Structured Streaming 1 - Intro
- Structured Streaming 2 - Word Count
- Structured Streaming 3 - Clickstream