MLB-Draft-Biases

The focus of the project within this repository is to analyze and identify Draft Biases within the Major League Baseball (MLB) Amateur Draft. The Draft has been taking place since 1965, and in the past has featured up to 100 rounds (today's version features 20 rounds).

Aside from identifying these draft biases we sought to determine if there was a method for identifying a player's success level in the MLB. Our metric of success used was FanGraphs Wins Above Replacement (fWAR). Our predictors were comprised entirely of variables that included information of player demographics, and physical characteristics.

Contributors

Peter D. DePaul III
- Data Collection
- Data Cleaning
- EDA and Visualizations
- Model Creation
- Final Report
Anish Ravilla
- Final Report
Robin Lee
- EDA and Visualizations
- Final Report
Alan Wong
Kevin Kim
Hongye Zhang

Data Dictionary

To find our dictionary of variables click below:

Dictionary

Data Collection and Data Cleaning

Our Data Collection process was performed utilizing baseballR, and pybaseball respectively. These processes can be found below:

Our data cleaning process was performed utilizing R and several packages (primarily those in tidyverse)

Data Cleaning

Data Files

The data files we used to build our models, and the raw data we collected are stored all within the file linked below:

Data Folder

Report

To read the report on our findings click the link below:

Draft Biases Report

Libraries and Resources Utilized

bookdown
- Used for generating the report utilizing the Bookdown syntax language Link
ggplot2
- Used to create the visualizations and EDA in the Report Link
gridExtra
corrplot
data.table
- Utilized to decrease memory of our data objects to reduce processing time. Link
tidyverse
- Utilized for the data cleaning process Link
tidymodels
- Utilized for the creation of the boosted decision tree prediction model Link
xgboost
- The xgboost engine was used for the boosted decision tree prediction model Link
doParallel
- Used for parallel processing during the tuning process of the model hyperparameters Link
vip
- Used to create the variable importance plot for the model. Link
caret
- Utilized for the training process of the model Link
Boruta
- Used to confirm feature selection importance Link
kableExtra
- Used to create LaTeX formatted tables within the report. Link
maps
mapsdata
mapproj
reshape2

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Data		Data
Draft-Biases-Report_cache/latex		Draft-Biases-Report_cache/latex
Draft-Biases-Report_files/figure-latex		Draft-Biases-Report_files/figure-latex
Draft-Biases-Report.Rmd		Draft-Biases-Report.Rmd
Draft-Biases-Report.pdf		Draft-Biases-Report.pdf
README.md		README.md
data_cleaning.R		data_cleaning.R
data_collection.R		data_collection.R
data_collection.py		data_collection.py
references.bib		references.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLB-Draft-Biases

Contributors

Data Dictionary

Data Collection and Data Cleaning

Data Files

Report

Libraries and Resources Utilized

About

Releases

Packages

Languages

pddiii/MLB-Draft-Biases

Folders and files

Latest commit

History

Repository files navigation

MLB-Draft-Biases

Contributors

Data Dictionary

Data Collection and Data Cleaning

Data Files

Report

Libraries and Resources Utilized

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages