This project implements a complex SQL query involving joins, aggregation, and sorting using an external database. The external database used in this project is Databricks, and the query fetches data from two related tables (Avengers
and Battles
) to calculate the total number of appearances and battles per Avenger. The I used can be found here.
-
SQL Query: Design and implement a complex SQL query that:
- Joins data from multiple tables.
- Aggregates data to compute totals.
- Sorts the results based on specific criteria.
-
Explanation: Provide detailed documentation explaining what the query does and the expected results.
-
CI/CD Pipeline: Implement a basic Continuous Integration/Continuous Deployment (CI/CD) pipeline that automatically runs tests and ensures the query executes without errors.
-
README: This document explains the query, its purpose, and how to run it. It is also the one you are currently reading. 😄
The following SQL query was designed for a MySQL-compatible database (Databricks in this case). It retrieves information about Avengers and their battle history, joining two tables (Avengers
and Battles
) to calculate the total number of battles each Avenger has participated in, as well as their total appearances in comics. The results are sorted by the number of battles and the number of appearances.
SELECT
a.`Name/Alias`,
SUM(a.Appearances) AS total_appearances,
COUNT(b.battle_id) AS total_battles
FROM
Avengers a
JOIN
Battles b ON a.avenger_id = b.avenger_id
GROUP BY
a.`Name/Alias`
HAVING
SUM(a.Appearances) > 100 -- Only Avengers with more than 100 appearances
ORDER BY
total_battles DESC, total_appearances DESC;
This query is designed to retrieve data about the Avengers and their participation in battles. The main purpose of the query is to calculate two key metrics for each Avenger:
- Total number of comic book appearances.
- Total number of battles they have fought.
Additionally, the query filters out any Avengers with fewer than 100 comic book appearances, and it sorts the results based on the total number of battles fought (from most to least) and, secondarily, by total appearances.
-
FROM and JOIN Clauses:
- The query is pulling data from two tables:
Avengers
andBattles
. - The
JOIN
clause is used to link the two tables using theavenger_id
field, which exists in both tables. This allows the query to combine information about each Avenger from theAvengers
table with information about their participation in battles from theBattles
table.
- The query is pulling data from two tables:
-
SELECT Clause:
a.Name/Alias
: Selects the name/alias of the Avenger from theAvengers
table.SUM(a.Appearances) AS total_appearances
: This sums the number of comic book appearances for each Avenger. TheSUM()
function aggregates the values in theAppearances
column for each Avenger.COUNT(b.battle_id) AS total_battles
: This counts the number of battles that each Avenger has participated in, based on the number ofbattle_id
entries in theBattles
table.
-
GROUP BY Clause:
- The query groups the results by
a.Name/Alias
- The query groups the results by
-
HAVING Clause:
- The
HAVING
clause filters out Avengers who have appeared in fewer than 100 comic books. The query only returns Avengers with 100 or more appearances (calculated by theSUM(a.Appearances)
) are included in the results.
- The
-
ORDER BY Clause:
- The results are first sorted by
total_battles DESC
, meaning that Avengers with the highest number of battles appear on the list first
- The results are first sorted by
The Avengers
table contains the following data:
avenger_id | Name/Alias | Appearances |
---|---|---|
1 | Iron Man | 3000 |
2 | Captain America | 2800 |
Then the Battles
table contains:
battle_id | avenger_id |
---|---|
1 | 1 |
2 | 1 |
3 | 2 |
4 | 2 |
5 | 3 |
The query would do the following:
- Join the two tables based on
avenger_id
- Calculate Iron Man has 3000 appearances and 2 battles,
- Calculates that Captain America has 2800 appearances and 2 battles
- Return Iron Man and Captain America and would be sorted by their number of battles and appearances
Name/Alias | total_appearances | total_battles |
---|---|---|
Iron Man | 3000 | 2 |
Captain America | 2800 | 2 |
This query allows us to gain insight into the involvement of different Avengers and take a closer look at how much prevalence they had within the comic universe and battles.