Skip to content

Latest commit

 

History

History
59 lines (32 loc) · 2.77 KB

README.md

File metadata and controls

59 lines (32 loc) · 2.77 KB

A simple data dashboard pipeline

Following are the main technologies used in this data pipeline:

1. Snowflake Warehouse
2. Fivetran
3. dbt (Data build tool)
4. Google Data Studio

Goals Achieved:

  1. Ingested data from the shared Google Sheet to the Snowflake database using Fivetran image

  2. Configured dbt cloud successfully to work on the Snowflake database image

  3. Wrote the following 5 models:

image

  • locations (Normalised geography data from the original raw table)

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."LOCATIONS";
    
  • cases_with_time (Normalised time series data that links with "locations" via location_id (new field))

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."CASES_WITH_TIME";
    
  • new_metrics (Generates NEW case data from the normalised tables "locations" and "cases_with_time")

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."NEW_METRICS";
    
  • total_metrics (Generates TOTAL case data from the normalised tables "locations" and "cases_with_time)

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."TOTAL_METRICS";
    
  • ratios_by_geolocation (Generates derived fields "coordinates" and case ratios from the RAW table)

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."RATIOS_BY_GEOLOCATION";
    
  • areas_with_highest_active_cases (Generates a table of the top locations having the highest NEW covid-19 cases)

    select * from "INTERVIEW_DB"."PLAYGROUND_TANMAY_KULKARNI"."AREAS_WITH_HIGHEST_ACTIVE_CASES";
    
  1. Connected Google Data Studio with Snowflake using the Supermetrics connector (I later faced some connection issues with Supermetrics, so I then decided to export my models from Snowflake as CSV and used them as data sources in GDS) image image

  2. Created a basic dashboard using the above data image image image