GitHub - imjustdavid/workshop-ServerlessAnalytics

Serverless Analytics

In this workshop you will learn how to implement big data in a serverless fashion, leveraging Amazon S3, AWS Glue, Amazon Athena and Amazon QuickSight to:

upload a dataset to your central data lake,
automate the creation of the data catalog,
schedule ETL processes that aggregate data from multiple tables and convert them into a compressed columnar format that allows to speed up and reduce the cost of your queries,
query the data using standard SQL
create and share rich web-based visualizations

All without having to manage clusters. Even more, without having to spin up a single instance.

Welcome to the serverless age!

The dataset

For this workshop we are going to use data made available by the New York City Taxi and Limousine Commission (TLC).

Raw CSV-formatted trip record data can be downloaded from the TLC website itself at www.nyc.gov/html/tlc/html/about/trip_record_data.shtml . On this page TLC provides monthly extracts on yellow cabs, boro taxis (green) and for-hire vehicles (FHV).

The yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The FHV trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location identifier.

You can use data from any month avaiable. In this particular example we are going to use data from November and December 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
cfn		cfn
functions		functions
images		images
.gitignore		.gitignore
01-setup.md		01-setup.md
02-data-catalog.md		02-data-catalog.md
03-data-discovery.md		03-data-discovery.md
04-etl.md		04-etl.md
05-visualization.md		05-visualization.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Analytics

The dataset

Let's get the show on the road

About

Releases

Packages

imjustdavid/workshop-ServerlessAnalytics

Folders and files

Latest commit

History

Repository files navigation

Serverless Analytics

The dataset

Let's get the show on the road

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages