Note

Considering that myself and my team have graduated from the Northcoders Data Engineering course, this project will be archived and made read-only. I will be continuing this project solo, which you can find here, where I will be adding more features over time.

ToteSys - Data Engineering Project

Contributors

_{Ellie Symonds}	_{Lianmei Manon-og}	_{Tolu Ajibade}
_{Joslin Rashleigh}	_{Anzelika Belotelova}	_{Alex Schofield}

Summary

The project aims to implement a data platform that can extract data from an operational database, archive it in a data lake, and make it easily accessible within a remodelled OLAP data warehouse.

The solution showcases our skills in:

Python
PostgreSQL
Database modelling
Amazon Web Services (AWS)
Agile methodologies

Main Objectives

Our goal is to create a reliable ETL (Extract, Transform, Load) pipeline that can:

Extract the data from the totesys operational database
Store the data in AWS S3 buckets, that will form our data lake
Transform the data into a suitable schema for the data warehouse
Load the transformed data into the data warehouse hosted on AWS

Key Features

We aim for the project to have certain features. Some are more prioritised than others.

Automated data ingestion from totesys db
Data storage for ingested and processed data in S3 buckets
Data transformation for data warehouse schema
Automated data loading into the data warehouse schema
Logging and monitoring with CloudWatch
Notifications for errors and successful runs (e.g. successful ingestion)
Visualisation of warehouse data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ToteSys - Data Engineering Project

Contributors

Summary

Main Objectives

Key Features

Files

README.md

Latest commit

History

README.md

File metadata and controls

ToteSys - Data Engineering Project

Contributors

Summary

Main Objectives

Key Features