The Common Crawl Exemplar is a fully worked example of running Map/Reduce via Hadoop on AWS EMR for textual analysis.
The Processing the NASA OpenNEX model in EMR activity processes climate model data using AWS EMR.
The Multiplying Many Integers via Prime Factorization using EMR activity is a simple example of using Map/Reduce to perform a computation.
The Acquiring Data from Twitter activity demonstrates how to acquire data from an API.
The Scraping the Web activity demonstrates gather information from the web.
The Crawling the Common Crawl activity demonstrates using prefetched web content from the Common Crawl.
The Data Munging activity demonstrates processing various data formats in Python.
The NoSQL Databases activity demonstrates using different NoSQL databases.
The Relational Databases activity demonstrates using a relational database from Python.
The Creating Clusters for EMR activity steps through setting up an EMR cluster for Map/Reduce (Hadoop) on AWS.
The Word Counts for Tweets activity steps through the infamous word count example on AWS EMR using tweet data.
The Map Task Input Splitting activity demonstrates how input is split by Hadoop on AWS EMR.
The Introduction to Spark activity introduces Spark and steps through reproducing various previous activities.
The Text Processing with NLTK activity introduces how text can be processed with NLTK in Python.
The Sentiment Analysis activity introduces Sentiment Analysis and steps through using it via Python.