There are 2 sets of data on crowdfunding that will be used. First we extract the data and import them into Pandas DataFrame. Next we process and organize the data into multiple tables using techniques such as regular expression in the last part of the process. There is also the option to use Python dictionary method instead of regex. Once the tables are completed, we export them into csv files, and upload it into an SQL database.
Data Modeling:
Here is the entity relationship diagram of all the tables:
For futher detail, you can click on this link: https://app.quickdatabasediagrams.com/#/d/MDz0ma
Start Running ETL_Mini_project_Starter_Code
You will be using the 2 xlsx files from DataSource folder to import into Pandas DataFrame:
1. crowdfunding.xlsx
2. contacts.xls
After processing and transforming the data, all the tables should be exported into the Resources folder.
-
You will first need to create a database.
- Right click on PostgrsSQL
- Click on create --> Database
- In the Database field type crowdfunding_db
- In Owner field, keep postgres
- Leave comment field blank
- Click on save -
Start creating the tables
- Right click on the database, crowdfunding_db
- Click on Query Tool
- Open crowdfunding_db_schema.sql. Copy and paste each table creation.
- After creating a table, import the appropriate csv file accordingly:
* Contacts --> contacts.csv
* category --> category.csv
* subcategory --> subcategory.csv
* campaign --> campaign.csv