Skip to content

Commit

Permalink
Finish checking makefile and make necessary edits
Browse files Browse the repository at this point in the history
  • Loading branch information
irene93 committed Dec 4, 2021
1 parent b65c3e6 commit 2f94e77
Show file tree
Hide file tree
Showing 21 changed files with 8,279 additions and 14 deletions.
11 changes: 7 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Makefile
# Allyson Stoll, Dec 2021
# Allyson Stoll, Irene Yan
# Dec 4, 2021

# This Makefile completes data download and preprocessing
# prior to EDA and model building of a ramen rating model
# based on ramen ratings from The Ramen Rater.

# Example usage:
# make data/raw/ramen_ratings.csv
# make all

#make all dependencies
all : doc/Report.html
Expand Down Expand Up @@ -49,11 +50,12 @@ results/prediction/prediction.csv results/prediction/test_metrics.jpg : data/pro
--out_file_result="results/prediction/"

# write the report
doc/Report.html : results/prediction/prediction.csv results/test_metrics.jpg \
doc/Report.html : results/prediction/prediction.csv results/prediction/test_metrics.jpg \
results/figures/stars_histogram.png results/figures/type_histogram.png \
results/figures/variety_wordcloud.png results/figures/ramen_map.png
Rscript -e "rmarkdown::render('doc/Report.Rmd')"

# remove the entire analysis
clean :
rm -rf data/raw/ramen_ratings.csv
rm -rf data/processed/train_df.csv
Expand All @@ -70,4 +72,5 @@ clean :
rm -rf results/Top_20_Good_features.csv
rm -rf results/Top_20_Bad_features.csv
rm -rf results/prediction/prediction.csv
rm -rf results/test_metrics.jpg
rm -rf results/prediction/test_metrics.jpg
rm -rf doc/Report.html
302 changes: 302 additions & 0 deletions data/eda/shapefiles_for_eda/ne_110m_admin_0_countries.README.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5.0.0-pre3
1 change: 1 addition & 0 deletions data/eda/shapefiles_for_eda/ne_110m_admin_0_countries.cpg
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
UTF-8
Binary file not shown.
1 change: 1 addition & 0 deletions data/eda/shapefiles_for_eda/ne_110m_admin_0_countries.prj
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.017453292519943295]]
Binary file not shown.
Binary file not shown.
791 changes: 791 additions & 0 deletions data/processed/test_processed.csv

Large diffs are not rendered by default.

3,161 changes: 3,161 additions & 0 deletions data/processed/train_codes_df.csv

Large diffs are not rendered by default.

3,161 changes: 3,161 additions & 0 deletions data/processed/train_processed.csv

Large diffs are not rendered by default.

11 changes: 7 additions & 4 deletions doc/report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,20 @@ Each observation in the data set is a review for a single ramen product. The fea
To understand the data better, we explore to visualize the distribution of the country of origins of all products. It seems that most products come from China, South Korea, Japan, and the USA.

```{r country-distributions, echo=FALSE, fig.cap="Figure 1. Origins of Ramen Products", out.width = '100%'}
knitr::include_graphics("../results/ramen_map.png")
knitr::include_graphics("../results/figures/ramen_map.png")
```

There are many variety and the below word cloud displays the most common keywords in ramen descriptions. Wow, these noodles are created with so many flavors! They also come in with different packaging. A half of the sample come in as a pack. But some are sold in a bowl or tray, which are more convenient for direct usage.

```{r variety-distributions, echo=FALSE, fig.cap="Figure 2. Word Cloud of Ramen Variety and Package Style Histogram", out.width = '50%', out.height = '25%', fig.show='hold', fig.align='center'}
knitr::include_graphics(c("../results/variety_wordcloud.png",
"../results/type_histogram.png"))
knitr::include_graphics(c("../results/figures/variety_wordcloud.png",
"../results/figures/type_histogram.png"))
```

Let's see how the ratings distribute. It look like most ramens are quite tasty! But there are a few that received a zero star.

```{r rating-distributions, echo=FALSE, fig.cap="Figure 3. Histogram of Ratings", out.width = '50%', out.height = '35%'}
knitr::include_graphics("../results/stars_histogram.png")
knitr::include_graphics("../results/figures/stars_histogram.png")
```

# Methods
Expand Down Expand Up @@ -79,3 +79,6 @@ kable(bad[1:5, 1:2],
First of all, the amount of data used to build the model is relatively small, which may have a certain impact on the model performance. Secondly, the feature `Top ten` was not used in the analysis process. In the future, we hope to make reasonable use of this indicator after learning more data processing methods. Lastly, we recognize that the data set contains reviews done by a single person, which makes our prediction model very subjective and not generalizable for the general audience. One shall proceed with caution when using this result as a shopping guide.

# References
@knitr

@docopt
20 changes: 14 additions & 6 deletions doc/report.html

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions results/Top_20_Bad_features.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
coef,vars,coef_abs
-1.859131090512311,Brand_Nona Lim,1.859131090512311
-1.859131090512311,Brand_Baijia,1.859131090512311
-1.8000845477210297,Brand_Koyo,1.8000845477210297
-1.6667973297101781,Brand_Mr. Noodles,1.6667973297101781
-1.582762006847416,Brand_Super,1.582762006847416
-1.5400938328225011,Country_Netherlands,1.5400938328225011
-1.5090750369123527,Brand_Knorr,1.5090750369123527
-1.4977225439709259,Brand_Wai Wai,1.4977225439709259
-1.390495192639744,Brand_GreeNoodle,1.390495192639744
-1.3442489890537426,Country_Canada,1.3442489890537426
-1.3258213977179438,Brand_Rap Snacks,1.3258213977179438
-1.3258213977179438,Brand_Ripe'n'Dry,1.3258213977179438
-1.2770926756313135,Country_UK,1.2770926756313135
-1.2730230778693896,Brand_Koka,1.2730230778693896
-1.2419060563261983,Brand_Chewy,1.2419060563261983
-1.1465059054357014,Country_Pakistan,1.1465059054357014
-1.0622565261538106,Brand_Snapdragon,1.0622565261538106
-1.0463065283107393,Brand_Goku-Uma,1.0463065283107393
-1.0463065283107393,Brand_Big Bon,1.0463065283107393
-1.0463065283107393,Brand_Sao Tao,1.0463065283107393
21 changes: 21 additions & 0 deletions results/Top_20_Good_features.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
coef,vars,coef_abs
1.5649459181767298,Brand_Samyang Foods,1.5649459181767298
1.2681857255167004,shoyu,1.2681857255167004
1.2651873240785507,Brand_Kabuto Noodles,1.2651873240785507
1.198119300736054,yum,1.198119300736054
1.1293006886436405,Country_Malaysia,1.1293006886436405
1.1024535853281041,Country_Indonesia,1.1024535853281041
1.1018107284225216,Brand_Dragonfly,1.1018107284225216
1.0931446280443882,Country_Singapore,1.0931446280443882
1.0830336739373123,Brand_Tseng Noodles,1.0830336739373123
1.0699045369878788,Brand_Kang Shi Fu,1.0699045369878788
0.9932382376710469,Brand_TTL,0.9932382376710469
0.9932382376710469,Brand_Wugudaochang,0.9932382376710469
0.9917403812134636,Brand_Wu-Mu,0.9917403812134636
0.9684478342524965,Brand_Rooster,0.9684478342524965
0.9517329824023497,Brand_Mike's Mighty Good Craft Ramen,0.9517329824023497
0.8870096698615191,Brand_Tasty Bite,0.8870096698615191
0.8813137963919575,Brand_Yamachan,0.8813137963919575
0.8650868054306605,Brand_Yamamoto Seifun,0.8650868054306605
0.860808481654824,Brand_MAMA,0.860808481654824
0.8453160680300208,Brand_Ruski,0.8453160680300208
Binary file added results/best_model.pkl
Binary file not shown.
Binary file added results/figures/ramen_map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified results/figures/variety_wordcloud.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2f94e77

Please sign in to comment.