Skip to content

Latest commit

 

History

History

activities

Activities

Examples

Common Crawl Exemplar

The Common Crawl Exemplar is a fully worked example of running Map/Reduce via Hadoop on AWS EMR for textual analysis.

Processing the NASA OpenNEX model in EMR

The Processing the NASA OpenNEX model in EMR activity processes climate model data using AWS EMR.

Multiplying Many Integers via Prime Factorization using EMR

The Multiplying Many Integers via Prime Factorization using EMR activity is a simple example of using Map/Reduce to perform a computation.

Acquiring Data

Acquiring Data from Twitter

The Acquiring Data from Twitter activity demonstrates how to acquire data from an API.

Scraping the Web

The Scraping the Web activity demonstrates gather information from the web.

Crawling the Common Crawl

The Crawling the Common Crawl activity demonstrates using prefetched web content from the Common Crawl.

Organizing

Data Munging - Processing JSON, XML, and CSV Data

The Data Munging activity demonstrates processing various data formats in Python.

NoSQL Databases

The NoSQL Databases activity demonstrates using different NoSQL databases.

Relational Databases

The Relational Databases activity demonstrates using a relational database from Python.

Analyzing

Creating Clusters for EMR

The Creating Clusters for EMR activity steps through setting up an EMR cluster for Map/Reduce (Hadoop) on AWS.

Word Counts for Tweets

The Word Counts for Tweets activity steps through the infamous word count example on AWS EMR using tweet data.

Map Task Input Splitting

The Map Task Input Splitting activity demonstrates how input is split by Hadoop on AWS EMR.

Introduction to Spark

The Introduction to Spark activity introduces Spark and steps through reproducing various previous activities.

NLP - Text Processing with NLTK

The Text Processing with NLTK activity introduces how text can be processed with NLTK in Python.

NLP - Sentiment Analysis (NLTK)

The Sentiment Analysis activity introduces Sentiment Analysis and steps through using it via Python.