Skip to content

Commit

Permalink
Change pictures
Browse files Browse the repository at this point in the history
  • Loading branch information
athy9193 committed Nov 28, 2020
1 parent b63081c commit e0a2dc4
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 8 deletions.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
8 changes: 4 additions & 4 deletions reports/reports.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ In this project we are trying to predict the quality of a given wine sample usin
We eventually decided to pick neutral network Multi-layer Perception (MLP) model as the model that yield the best results after running the various machine learning models through the train data set, comparing their performance based on f1-score and checking consistency across cross-validation runs. We noticed that random forest recorded high f1-validation score at 0.84, however, it also had a large gap between train and validation with a perfect train score of 1. This caused us to think the model has overfitted. Logistic regression also showed promising f1 validation score results in our case, yet this high results were not consistent across cross-validation splits. Hence, with most models struggled to get to the 0.8 f1-score mark without significantly overfitting on the train set, while MLP shows consistent results across all cross-validation splits, our final choice landed on MLP model because we think it would generalize better.

```{r, fig.cap = "Table 1: Score results among different machine learning model we have explore"}
knitr::include_graphics("models_c.png")
knitr::include_graphics("models_c_revised.png")
```

The Python and R programming languages [@R; @Python] and the following Python and R packages were used to perform the analysis: scikit-learn [@scikit-learn], docoptpython [@docoptpython], docopt [@docopt], altair [@altair], vega-lite [@vega-lite], IPython-ipykernel [@IPython], matplotlib [@matplotlib], scipy [@SciPy], numpy [@harris2020array], pandas [@pandas], graphviz [@graphviz], pandas-profiling [@pandasprofiling2019], knitr [@knitr], tidyverse [@tidyverse], kableExtra [@kableExtra]. The code used to perform the analysis and re-create this report can be found [here](https://github.com/UBC-MDS/Wine_Quality_Predictor#usage)
Expand All @@ -64,11 +64,11 @@ knitr::include_graphics("../eda/wine_EDA_files/wine_quality_rank_per_feature.png
Since this is a multi-class classification, our goal was to find a model that was consistent and able to recognize patterns from our data. We choose to use a neutral network Multi-layer Perception (MLP) model as it was consistent and showed promising results. If we take a look at the accuracy scores and f1 scores across cross validation splits, we can see that it is pretty consistent which was not the case with many models.

```{r}
knitr::include_graphics("f1.png")
knitr::include_graphics("f1_revised.png")
```

```{r}
knitr::include_graphics("Accuracy_plot.png")
knitr::include_graphics("accuracy_plot_revised.png")
```


Expand All @@ -77,7 +77,7 @@ Figure 2: Accuracy scores and f1 scores across cross validation splits for neutr
Our model performed quite well on the test data as well. If we take a look at the confusion matrix below. As we discussed earlier, the prediction at the higher end of wine quality spectrum is acceptable. As we can see from the confusion matrix below, \~15% error rate for the higher end of spectrum and also very acceptable false classifications in the low end of spectrum.

```{r, fig.cap = "Figure 3: Confusion Matrix"}
knitr::include_graphics("cf_matrix.png")
knitr::include_graphics("cf_matrix_revised.png")
```

Having said that the research also need further improvement in terms of obtaining a more balanced data set for training and cross-validation. More feature engineer and selection could be conducted to minimize the affect of correlation among the explanatory variable. Furthermore, in order to assess the robustness of the predicting model, we need to test the model with deployment data in real world besides testing with our test data.
Expand Down
8 changes: 4 additions & 4 deletions reports/reports.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ because we think it would generalize better.

<div class="figure">

<img src="models_c.png" alt="Table 1: Score results among different machine learning model we have explore" width="1207" />
<img src="models_c_revised.png" alt="Table 1: Score results among different machine learning model we have explore" width="1207" />

<p class="caption">

Expand Down Expand Up @@ -166,9 +166,9 @@ was consistent and showed promising results. If we take a look at the
accuracy scores and f1 scores across cross validation splits, we can see
that it is pretty consistent which was not the case with many models.

<img src="f1.png" width="439" />
<img src="f1_revised.png" width="439" />

<img src="Accuracy_plot.png" width="439" />
<img src="accuracy_plot_revised.png" width="439" />

Figure 2: Accuracy scores and f1 scores across cross validation splits
for neutral network Multi-layer Perception (MLP) model
Expand All @@ -182,7 +182,7 @@ the low end of spectrum.

<div class="figure">

<img src="cf_matrix.png" alt="Figure 3: Confusion Matrix" width="413" />
<img src="cf_matrix_revised.png" alt="Figure 3: Confusion Matrix" width="413" />

<p class="caption">

Expand Down

0 comments on commit e0a2dc4

Please sign in to comment.