Skip to content

MorillaGit/ETL_exercise_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract transform load (ETL)

Introduction

  • This repository has the function of capturing, files hosted locally .csv, transform them, extracting some features of the data as an example. Then load them into a Posgres database.

Organization

The organization of this repository is as follows:

├───src
│   ├─── extraction.py
│   ├─── loading.py
│   ├─── setting.py
│   ├─── transformation.py
│   └─── tables.sql
├─── pyproject.toml
├─── poetry.lock
├─── README.md
└─── requirements.txt

Requirements

Instructions

  1. Clone the repository. whit command: git clone, or download the repository in zip format.
git clone https://github.com/MuttData/ETL_exercise_1
cd ETL_exercise_1
  1. Open a terminal in the repository folder.

  2. Run the command (if you do not have the image it will take a few minutes)

docker pull postgres
  1. This comand build the image of Docker. Official image of postgres in Docker Hub https://hub.docker.com/_/postgres

  2. Execute the command


```docker
docker run -d -h <hostname or ip address>  -p <port>:5432 --name <name_dontainer> -e POSTGRES_USER=<User> -e POSTGRES_PASSWORD=<Password> -e POSGRES_DB=<DB> postgres

This command creates a Docker container with the postgres image. That will connect to the database with the user and the password on the port of the host. Replace the values between <> with the desired values.

[ Nota ] En caso de tener instalado un cliente de SQL como DBeaver, o pgAdmin Es posible correr las consultas SQL desde estos clientes. con las siguientes configuraciones:

[ Nota ] If you have an SQL client installed such as DBeaver, or pgAdmin It is possible to run the SQL queries from these clients, whitout docker. with the following configurations:

Host: <localhost>
Port: <Port>
User: <User>
Password: <Password>
DB: <DB>
Schema: <Schema>
  1. It is necessary to get the packages of poetry stored in the poetry.toml For this run the command
poetry install 
poetry run python <script_name>.py

to run the scripts. The scripts are in the src folder.

poetry run python main.py

[ Note ] It is not necessary to use poetry, you can install the packages with pip. The packages are in the requirements.txt file. Alternative installation with Conda or pip. Using a package manager such as Conda, you can install the packages with the following command:

pip install -r requirements.txt

In case you do not have pip installed, you can install it with the following command:

python -m pip install --upgrade pip

Next you can install the packages with the following command:

pip install -r requirements.txt

To use the scripts, you must run the following command:

python main.py

Description

This repository has the function of capturing, files hosted locally .csv, transform them, extracting some features of the data as an example. Then load them into a Posgres database. Just for practice.

About

ETL Exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages