Skip to content

imjustdavid/workshop-ServerlessAnalytics

Repository files navigation

Serverless Analytics

In this workshop you will learn how to implement big data in a serverless fashion, leveraging Amazon S3, AWS Glue, Amazon Athena and Amazon QuickSight to:

  • upload a dataset to your central data lake,
  • automate the creation of the data catalog,
  • schedule ETL processes that aggregate data from multiple tables and convert them into a compressed columnar format that allows to speed up and reduce the cost of your queries,
  • query the data using standard SQL
  • create and share rich web-based visualizations

All without having to manage clusters. Even more, without having to spin up a single instance.

Welcome to the serverless age!

NYC Taxi & Limousine Commision trip record data

The dataset

For this workshop we are going to use data made available by the New York City Taxi and Limousine Commission (TLC).

Raw CSV-formatted trip record data can be downloaded from the TLC website itself at www.nyc.gov/html/tlc/html/about/trip_record_data.shtml . On this page TLC provides monthly extracts on yellow cabs, boro taxis (green) and for-hire vehicles (FHV).

NYC Taxi & Limousine Commision trip record data

The yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The FHV trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location identifier.

You can use data from any month avaiable. In this particular example we are going to use data from November and December 2017.

Let's get the show on the road

  1. Upload the dataset
  2. Create the data catalog with AWS Glue
  3. Data discovery with Amazon Athena
  4. ETL with AWS Glue
  5. Create rich web-based visualizations with Amazon QuickSight

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published