Our project provides a database of quarterback performances per game so answer questions such as: Who is the highest performing quarterback in N seasons? Which QB is the best performing average? Do I draft CJ Stroud?
Our data engineering system maintains a list of NFL quarterbacks over the last 4 years, their current teams, the games they played in, and their standard fantasy football score each game.(.04passings yard + .1rushing yards + 4passing touchdowns+.6rushing touchdowns - 2*fumbles)
We seperated games by weeks because its how NFL seasons are split and organized the weeks of the 2020-2023 seasons into 1 table so the weeks do not overlap.
Importantly, we left in a lot of "empty" data, as it in itself is it's own statistic. Using this data we can find statistics on how often players were actually played in a series, and for more specific averages per weeks/matches played.
- Php Web Server - General DB Management
- Mysql ODB Server - Our operational database to store our tables
- Mysql DW Server - Our data warehouse holding onto our fact table and player performance information
- ODB Phpmyadmin Server - Allows us to interact with the ODB sql server
- DW Phpmyadmin Server - Allows us to interact with the DW sql server
- Kafka Broker - Allows our producers and consumers to interact with each other
- Zookeeper - Keeps track of our current broker and tells producers and consumers to go through our broker
Our data loading process involved organizing our data to be used. Each player and team was given a table. Each week in the past 4 seasons was given its own id. We then normalized these into one table that held each player's NAME TEAM and PLAYER PERFORMANCE(fantasy score) for EACH WEEK in the past four seasons and cross referenced this against a different dataset to find which teams the player played against. Then, using kafka, we uploaded the tables to the ODB and created a new table which gave each game that was played by someone its own row. Our data warehouse would then mirror our performance per week dataset.
erDiagram
PLAYER }|--|| PPpW: Dimension
PLAYER {
int TeamId
string Name
string Pos
}
PPpW {
int PlayerID
int TeamID
int OppTeamID
int WeekID
double fScore
}
FACT {
int PlayerID
int TeamID
int OppTeamID
int WeekID
double Score
}
TEAM }|--|| PPpW: Dimension
TEAM {
string Name
string Abreviation
string Conference
string Division
}
WEEK }|--|| PPpW: Dimension
WEEK {
int WeekNumber
int Season
}
PPpW}|--|| Lifetime: ODB-to-DW_Aggregate
Lifetime {
int PlayerID
double Score
}
PPpW }|--|{ FACT: ODB-to-DW
PPpW}|--|| Vs: ODB-to-DW_Aggregate
Vs {
int OppTeamID
double Score
}
The data stream supported in our system was Kafka. We used it to move our data from CSVs to our ODB and our summary table to our DW.
Our main summary table is the FACT table. We took the data through our normalized csv then PPpW table and mirrored it to create our fact table. We also made a summary table for the total points each player earned in their Lifetime and the total points have been scored against each team in the past 4 seasons called Vs.
Player with the most 40-point games:
Every 40-point game:
Dak Prescott's Lifetime Performance:
Strongest Defensive Team: