This project demonstrates how to interact with a SQL database using Python, involving a complex SQL query that includes joins, aggregation, and sorting. The repository covers the end-to-end process, from downloading CSV files, loading them into a database, and executing queries, all while following best practices with CI/CD integration.
-
Complex SQL Query:
- The project contains a SQL query that joins multiple tables, performs aggregations, and sorts the results based on the specified criteria.
-
Python Integration:
- Python scripts are used to automate downloading CSV data from remote sources, load them into a SQL database, and execute SQL queries.
-
CI/CD Pipeline:
- The project includes continuous integration and deployment (CI/CD) through GitHub Actions, ensuring the project is automatically tested upon each commit.
The SQL query in this project involves:
- Joins between tables containing football club and player data.
- Aggregation to calculate statistics such as the average number of goals per club.
- Sorting to rank clubs by total goals scored.
Example Query:
Select club.name as club_name,
SUM(player.market_value_in_eur) as total_market_value,
COUNT(*) as included_players
from ids706_data_engineering.default.rm564_football_clubs_players player
JOIN ids706_data_engineering.default.rm564_football_clubs club ON player.current_club_id = club.club_id
GROUP BY club_name
ORDER BY total_market_value DESC
limit 20 ;
- Club Name: The name of the football club.
- Total Market Value: Total market value for each club.
- Included Players: The number of players in each club.
This query joins the rm564_football_clubs
and rm564_football_clubs_players
tables on the club_id
, aggregates player statistics for each club calculates the total market value and number of players included. The dataset sorted by total market value for the each club.
- Python 3.12
- Databricks Connection (or alternative cloud SQL database)
- GitHub Actions for CI/CD
- Clone the repository:
git clone https://github.com/nogibjj/Ramil-Complex-SQL-Query-MySQL-Database.git
cd Ramil-Complex-SQL-Query-MySQL-Database
- Install dependencies:
make install
- Set up environment variables:
- Create a
.env
file in the root directory with the following variables:
SERVER_HOSTNAME=your_databricks_server_hostname
HTTP_PATH=your_http_path
ACCESS_TOKEN=your_access_token
-
Extract CSV files and load data into the database:
python main.py
-
Execute queries:
- You can run specific SQL queries using the
execute_query
orfetchall_result
functions.
- You can run specific SQL queries using the
To run the tests:
make test_file
To run the format:
make format
To run the lint:
make lint
The repository uses GitHub Actions for continuous integration and deployment. Every commit triggers the CI pipeline, which:
- Runs automated tests on the Python codebase.
- Ensures the SQL query and database interactions work as expected.
You can use GitHub Actions for CI, and Docker to replicate the CI environment locally if desired.