Skip to content

Commit

Permalink
Merge pull request #84 from vigneshRajakumar/i62
Browse files Browse the repository at this point in the history
Update image title and test gha
  • Loading branch information
vigneshRajakumar authored Dec 13, 2020
2 parents a46e752 + 5b1fa4a commit 08c4af5
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 3 deletions.
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Use rocker tidyverse as base to get make, base r and tidyverse
FROM rocker/tidyverse

# R dependencies for final report
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This prediction problem could have been framed as either a multi-class classific

Next, we determined the point at which to split whether a wine is considered good or bad. This was done with the goal of minimizing the amount of class imbalance in the dataset as well as considering the rating in reality. A split at 5.5 (i.e. 6 or higher is good, 5 or lower is bad) was determined to be the resonable option from both perspectives and led to a 60/40 class split of good/bad wines respectively.

Feature-target relationships were also plotted during EDA to see if any features stood out as potentially good indicators of wine quality. The most prevelant feature at indicating quality appeared to be alcohol content. Further discussion of the EDA can be found [here](https://github.com/UBC-MDS/dsci-522-group14/blob/main/src/wine_quality_eda.ipynb).
Feature-target relationships were also plotted during EDA to see if any features stood out as potentially good indicators of wine quality. The most prevelant feature at indicating quality appeared to be alcohol content. Further discussion of the EDA can be found [here](https://github.com/UBC-MDS/dsci-522-group14/blob/main/src/wine_quality_eda.ipynb).

The data is presented as separate tables for red and white wine, we elected to combine the data into one dataset and see if the type of wine influences the rating. The resulting table contains 12 features (11 numeric and 1 binary categorical). For modelling, the numeric features will be scaled using sci-kit-learn's StandardScalar() transformer. While the categorical feature will be transformed to a binary one using OneHotEncoding(). Given the nature of the classification problem, the following models will be evaluated and scored to determine their appropriateness in prediction:
- DummyClassifier (baseline)
Expand All @@ -45,7 +45,7 @@ The data is presented as separate tables for red and white wine, we elected to c
The results of this analysis will be presented in a report outlining the prediction accuracy of various models (as a table) and discussion of which features are important for this prediction task (e.g. Logistic Regression coefficients).

This Data Set was created by P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. It was sourced from the UCI Machine Learning Repository (Dua and Graff 2017). Due to privacy reasons, only the physicochemical properties have been included and information about the company, grape type, price, etc. are left out.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. It was sourced from the UCI Machine Learning Repository (Dua and Graff 2017). Due to privacy reasons, only the physicochemical properties have been included and information about the company, grape type, price, etc. are left out.

### Report

Expand Down Expand Up @@ -90,6 +90,8 @@ To reset the repo to a clean state, run the following from the root directory of
docker run --rm -p 8787:8787 -e PASSWORD="test" -v /$(pwd):/home/data-analysis -it vigneshrajakumar/wine_predictor make -C "home/data-analysis/" clean
```

> When running on windows, you would have to use Git Bash or a similar linux style shell for the above commands to work.
#### Dependency Diagram of the Makefile
![](Makefile.png)

Expand Down
2 changes: 1 addition & 1 deletion src/eda_wine.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def draw_target_plot(train_df):
Returns:
alt.Chart: plot object of target plot
"""
target_plot = alt.Chart(train_df, title='Target distribution').mark_bar().encode(
target_plot = alt.Chart(train_df, title='Target Counts').mark_bar().encode(
x='target',
y='count()'
)
Expand Down

0 comments on commit 08c4af5

Please sign in to comment.