Spark Labs - Scala

Welcome to Spark labs bundle. This is the Scala track.

To Instructor

Create a lab bundle as follows

    $   ./assemble-labs.sh

Labs

Instructor will provide lab bundle

Download Data

The VMs already have data loaded. This for your own reference.

Link to Full Dataset (Note: Large download, ~300 Meg)

Click the above link to download or
use wget from command line

    $    wget   "https://s3.amazonaws.com/elephantscale-public/data/data.zip"

Labs

1 - Scala Primer

1.1 - Scala shell
[1.2 - Functions]
Setup 1 - Instructor to demo first
1.3 - File IO
1.4 - Higher Order Functions
1.5 - Vending Machine class
1.6 - Unit testing with SPECS2

2 - Spark Intro

2.1 - Install and run Spark
2.2 - Spark Shell - Standalone || Hadoop version

3 - Spark Core

3.1 - RDD basics
3.2 - Dataset basics
3.3 - Caching

4 - Dataframes and Datasets

4.1 - Dataframes
4.2 - Spark SQL
4.3 - Dataset
4.4 - Caching 2 SQL
4.5 - Spark & Hive (Hadoop)
4.6 - Data formats

5 - API

5.1 - Submit first application
BONUS : 5.2 - Mapreduce using API

Practice Labs for end of day 2

Practice Lab 1 - Analyze Spark Commit logs
(If time permits) Practice Lab 3 - Optimize SQL query

6 - MLLib

6.1 - Kmeans
6.2 - Recommendations
6.3 - Classification

7 - GraphX

7.1 - Influencers (Twitter)
7.2 - Shortest path (in LinkedIn)

8 - Streaming

Structured Streaming

This is the new recommended API for streaming.

Structured Streaming 1 - Intro
Structured Streaming 2 - Word Count
Structured Streaming 3 - Clickstream

Classic Streaming

Streaming over TCP
Windowed Count
Kafka Streaming

9 - Operations

9.1 - Cluster setup

10 - Spark and Hadoop (all the Hadoop labs are grouped here)

2.2H - Spark Shell on Hadoop
3.1 - Loading RDDs from HDFS
4.2 - Spark SQL on Hadoop
4.4H - Spark & Hive

Practice Labs

Practice Lab 1 - Analyze Spark Commit logs
Practice Lab 2 - Analyze house sales data
Practice Lab 3 - Optimize SQL query
Practice Lab 4 - Analyze clickstream data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-scala.md

README-scala.md

Spark Labs - Scala

To Instructor

Labs

Download Data

Labs

1 - Scala Primer

2 - Spark Intro

3 - Spark Core

4 - Dataframes and Datasets

5 - API

Practice Labs for end of day 2

6 - MLLib

7 - GraphX

8 - Streaming

Structured Streaming

Classic Streaming

9 - Operations

10 - Spark and Hadoop (all the Hadoop labs are grouped here)

Practice Labs

Files

README-scala.md

Latest commit

History

README-scala.md

File metadata and controls

Spark Labs - Scala

To Instructor

Labs

Download Data

Labs

1 - Scala Primer

2 - Spark Intro

3 - Spark Core

4 - Dataframes and Datasets

5 - API

Practice Labs for end of day 2

6 - MLLib

7 - GraphX

8 - Streaming

Structured Streaming

Classic Streaming

9 - Operations

10 - Spark and Hadoop (all the Hadoop labs are grouped here)

Practice Labs