diff --git a/Makefile.dot b/Makefile.dot index 67347d4..3d27523 100644 --- a/Makefile.dot +++ b/Makefile.dot @@ -11,7 +11,7 @@ n16[label="reports/reports.md", color="green"]; n18[label="reports/wine_refs.bib", color="green"]; n14[label="results/best_Model.pkl", color="red"]; n10[label="results/final_model_quality.png", color="red"]; -n3[label="results/wine_quality_rank_per_feature.png", color="red"]; +n3[label="results/wine_quality_rank_per_feature.svg", color="red"]; n8[label="src/download_data.py", color="green"]; n15[label="src/fit_wine_quality_predict_model.py", color="green"]; n6[label="src/pre_processing_wine.py", color="green"]; diff --git a/Makefile.png b/Makefile.png index fc13d0f..4a9139e 100644 Binary files a/Makefile.png and b/Makefile.png differ diff --git a/README.md b/README.md index 72b48d8..f9c8c73 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,22 @@ The final report can be found [here](https://github.com/UBC-MDS/Wine_Quality_Pre ## Usage +There are two suggested ways to run this analysis: + + +#### 1\. Using Docker +To run this analysis using Docker, clone/download this repository, use the command line to navigate to the root of this project on your computer, and then type the following (filling in PATH_ON_YOUR_COMPUTER with the absolute path to the root of this project on your computer). + + +```bash +docker run --rm -v PATH_ON_YOUR_COMPUTER:/home/data_analysis_eg ttimbers/data_analysis_pipeline_eg make -C '/home/data_analysis_eg' all +``` +To clean up the analysis type: + +```bash +docker run --rm -v PATH_ON_YOUR_COMPUTER:/home/data_analysis_eg ttimbers/data_analysis_pipeline_eg make -C '/home/data_analysis_eg' clean +``` +#### 2\. Using Makefile To replicate the analysis, clone this GitHub repository, install the [dependencies](#dependencies) listed below, and run the following commands at the command line/terminal from the root directory of this diff --git a/reports/reports.Rmd b/reports/reports.Rmd index d2336d4..f69705b 100644 --- a/reports/reports.Rmd +++ b/reports/reports.Rmd @@ -56,7 +56,7 @@ In this project we are trying to predict the quality of a given wine sample usin We eventually decided to pick neutral network Multi-layer Perception (MLP) model as the model that yield the best results after running the various machine learning models through the train data set, comparing their performance based on f1-score and checking consistency across cross-validation runs. We noticed that random forest recorded high f1-validation score at 0.84, however, it also had a large gap between train and validation with a perfect train score of 1. This caused us to think the model has overfitted. Logistic regression also showed promising f1 validation score results in our case, yet this high results were not consistent across cross-validation splits. Hence, with most models struggled to get to the 0.8 f1-score mark without significantly overfitting on the train set, while MLP shows consistent results across all cross-validation splits, our final choice landed on MLP model because we think it would generalize better. -```{r, fig.cap = "Table 1: Score results among different machine learning model we have explore", fig.align='center'} +```{r, fig.cap = "Figure 3: Score results among different machine learning model we have explore", fig.align='center'} knitr::include_graphics("../results/f1_score_all_classifiers.svg") ``` @@ -68,7 +68,7 @@ The Python and R programming languages [@R; @Python] and the following Python an Looking at the distribution plot of the respective wine quality group interacting with each explanatory features, we can see that higher quality wine seems to be more associated with higher `alcohol` level and lower `density`. Lower `volatile acidity` also seems to be indicative of better wine. Better ranked wine also seem to have `higher free sulfur dioxide` level than poor wine though the relationship is not that clear based on the plot. The rest of the features do not seems be very distinguishable among different quality wine. -```{r distribution plot, fig.cap = "Figure 3: Distribution plot between wine quality and various attributes from physicochemical test", fig.align='center'} +```{r distribution plot, fig.cap = "Figure 4: Distribution plot between wine quality and various attributes from physicochemical test", fig.align='center'} knitr::include_graphics("../eda/wine_EDA_files/wine_quality_rank_per_feature.svg") @@ -76,14 +76,14 @@ knitr::include_graphics("../eda/wine_EDA_files/wine_quality_rank_per_feature.svg Since this is a multi-class classification, our goal was to find a model that was consistent and able to recognize patterns from our data. We choose to use a neutral network Multi-layer Perception (MLP) model as it was consistent and showed promising results. If we take a look at the accuracy scores and f1 scores across cross validation splits, we can see that it is pretty consistent which was not the case with many models. -```{r, echo=FALSE,out.width="50%", out.height="20%",fig.cap="Figure 4: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model",fig.show='hold',fig.align='center'} +```{r, echo=FALSE,out.width="50%", out.height="20%",fig.cap="Figure 5: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model",fig.show='hold',fig.align='center'} knitr::include_graphics(c("../results/f1_score_random_forest.svg","../results/f1_score_mlp.svg")) ``` Our model performed quite well on the test data as well. If we take a look at the confusion matrix below. As we discussed earlier, the prediction at the lower end of wine quality spectrum is acceptable. As we can see from the confusion matrix below, ~13% error rate for the lower end of spectrum and also very acceptable false classifications in the high end of spectrum. -```{r, fig.cap = "Figure 5: Confusion Matrix", fig.align='center'} +```{r, fig.cap = "Figure 6: Confusion Matrix", fig.align='center'} knitr::include_graphics("../results/final_model_quality.png") ``` diff --git a/reports/reports.html b/reports/reports.html deleted file mode 100644 index cc98e2a..0000000 --- a/reports/reports.html +++ /dev/null @@ -1,225 +0,0 @@ - - - - - - - - - - - - - - - - - - -

Predicting wine quality using measurements of physiochemical tests

-

Alex Truong, Bruhat Musinuru, Rui Wang and Sang Yoon Lee
2020-11-26 (updated: 2020-12-11)

- -

Summary

-

For this analysis, we used the neutral network Multi-layer Perception (MLP) model in order to try to predict wine quality based on the different wine attributes obtained from physicochemical tests such as alcohol, sulfur dioxide, fixed acidity, residual sugar. When we test it with the different validation data sets, the model yield robust results with 80% accuracy and 80% f1- score (a weighted average metric between precision and recall rate). We also have comparably high score at 80% accuracy and f1-score when we run the model on our test set. Based on these results, we opine that that the model seems to generalize well based on the test set predictions.

-

However, it incorrectly classifies 13.7% of the data in the lower end of spectrum (between normal and poor). This could be due to class imbalance present in the data set where normal samples outnumber poor by roughly twenty times. Improving the data collection methods to reduce the data class imbalance and using an appropriate assessment metric for imbalanced data can help to improve our analysis. On the other hand, given the rate of miss-classification is not so high and the impact can be corrected in further assessment, we believe this model could decently serve its purpose as a wine predictor to conduct first-cut assessment, which could help speed up the wine ratings process.

-

Introduction

-

Traditional methods of categorizing wine are prone to human error and can vary drastically from expert to expert. We propose a data mining approach to predict wine quality using machine learning techniques for classification problems. The resulting model, we hope, could serve as as one of scientific and systematic ways to classify wine, which is a springboard for further research in personalized wine recommendation, quality assessment and comparison unit.

-

Moreover, we believe wineries or wine rating institutes could find the model as a useful and reliable first-cut wine quality test before further expert’s assessment. This could lead to a more cost and time-effective wine screening process, and subsequently facilitate more effective and efficient business decisions and strategies.

-

Methods

-

Data

-

The data set used in this project is the results of a chemical analysis of the Portuguese “Vinho Verde” wine, conducted by Paulo Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis (Cortez et al. 2009). It was sourced from the UCI Machine Learning Repository (Dua and Graff 2017) which can be found here.

-

There are two datasets for red and white wine samples. For each wine sample observation , the inputs contains measurements of various objective physicochemical tests, and the output is the median wine quality ratings given by experts on the scale from 0 (very bad) and 10 (very excellent).The author notes that data on grape types, wine brand, wind selling price among other are not available due to privacy and logistics issues. There are 1599 observations for red wine and 4898 observations of white wine.

-
- -Figure 1: Distribution of type of wine -

-Figure 1: Distribution of type of wine -

- -
- -

Analysis

-

At the preprocessing stage, we decided to combine the red and white data set as well as group the data in bigger classification, namely “poor,” “normal” and “excellent” for scale “1-4,” “5-6” and “7-9” so as to have bigger sample size (as per Figure 2). We acknowledge that the data is imbalanced, hence instead of only using accuracy based to judge the model performance, we also include f1-score and use it as our main assessment metric. f-1 score is metric that combine both the precision and recall metrics, which focus on the false negative and false positive rate of the data and would be appropriate to use with an imbalanced data set.{Bruhat: to add more justification for f-1 micro score}

-
- -Figure 2: Regrouping of wine quality classification -

-Figure 2: Regrouping of wine quality classification -

- -
- -

In this project we are trying to predict the quality of a given wine sample using wine attributes obtained from various physicochemical tests. Based on our literary review, we found that researchers from Karadeniz Technical Univeristy used Random Forest Algorithm had also tried to classify between red wine and white wine for the same dataset (Er and Atasoy 2016). They further used 3 different data mining algorithms namely k-nearest-neighbourhood random forests and support vector machine learning to classify the quality of both red wine and white wine. This motivates us to proceed with to use cross-validation to select the best model for our analysis.

-

We eventually decided to pick neutral network Multi-layer Perception (MLP) model as the model that yield the best results after running the various machine learning models through the train data set, comparing their performance based on f1-score and checking consistency across cross-validation runs. We noticed that random forest recorded high f1-validation score at 0.84, however, it also had a large gap between train and validation with a perfect train score of 1. This caused us to think the model has overfitted. Logistic regression also showed promising f1 validation score results in our case, yet this high results were not consistent across cross-validation splits. Hence, with most models struggled to get to the 0.8 f1-score mark without significantly overfitting on the train set, while MLP shows consistent results across all cross-validation splits, our final choice landed on MLP model because we think it would generalize better.

-
- -Table 1: Score results among different machine learning model we have explore -

-Table 1: Score results among different machine learning model we have -explore -

- -
- -

The Python and R programming languages (R Core Team 2019; Van Rossum and Drake 2009) and the following Python and R packages were used to perform the analysis: scikit-learn (Pedregosa et al. 2011), docoptpython (Keleshev 2014), docopt (de Jonge 2018), altair (VanderPlas et al. 2018), vega-lite (Satyanarayan et al. 2017), IPython-ipykernel (Pérez and Granger 2007), matplotlib (Hunter 2007), scipy (Virtanen et al. 2020), numpy (Harris et al. 2020), pandas (McKinney and others 2010), graphviz (Ellson et al. 2001), pandas-profiling (Brugman 2019), knitr (Xie 2014), tidyverse (Wickham 2017), kableExtra (Zhu 2020). The code used to perform the analysis and re-create this report can be found here

-

Results & Discussion

-

Looking at the distribution plot of the respective wine quality group interacting with each explanatory features, we can see that higher quality wine seems to be more associated with higher alcohol level and lower density. Lower volatile acidity also seems to be indicative of better wine. Better ranked wine also seem to have higher free sulfur dioxide level than poor wine though the relationship is not that clear based on the plot. The rest of the features do not seems be very distinguishable among different quality wine.

-
- -Figure 3: Distribution plot between wine quality and various attributes from physicochemical test -

-Figure 3: Distribution plot between wine quality and various attributes -from physicochemical test -

- -
- -

Since this is a multi-class classification, our goal was to find a model that was consistent and able to recognize patterns from our data. We choose to use a neutral network Multi-layer Perception (MLP) model as it was consistent and showed promising results. If we take a look at the accuracy scores and f1 scores across cross validation splits, we can see that it is pretty consistent which was not the case with many models.

-
- -

Figure 4: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) modelFigure 4: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model

-

-Figure 4: Accuracy scores and f1 scores across cross validation splits -for neutral network Multi-layer Perception (MLP) model -

- -
- -

Our model performed quite well on the test data as well. If we take a look at the confusion matrix below. As we discussed earlier, the prediction at the lower end of wine quality spectrum is acceptable. As we can see from the confusion matrix below, ~13% error rate for the lower end of spectrum and also very acceptable false classifications in the high end of spectrum.

-
- -Figure 5: Confusion Matrix -

-Figure 5: Confusion Matrix -

- -
- -

Having said that the research also need further improvement in terms of obtaining a more balanced data set for training and cross-validation. More feature engineer and selection could be conducted to minimize the affect of correlation among the explanatory variable. Furthermore, in order to assess the robustness of the predicting model, we need to test the model with deployment data in real world besides testing with our test data.

-

In conclusion, we think that with a decent error rate, our predicting model based on neutral network Multi-layer Perception (MLP) model would serve well as an effective first-cut assessment on wine quality.

-

References

-
- -
- -

Brugman, Simon. 2019. “pandas-profiling: Exploratory Data Analysis for Python.” https://github.com/pandas-profiling/pandas-profiling.

-
- -
- -

Cortez, Paulo, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. “Modeling Wine Preferences by Data Mining from Physicochemical Properties.” Decision Support Systems 47 (4): 547–53.

-
- -
- -

de Jonge, Edwin. 2018. Docopt: Command-Line Interface Specification Language. https://CRAN.R-project.org/package=docopt.

-
- -
- -

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

-
- -
- -

Ellson, John, Emden Gansner, Lefteris Koutsofios, Stephen North, Gordon Woodhull, Short Description, and Lucent Technologies. 2001. “Graphviz - Open Source Graph Drawing Tools.” In Lecture Notes in Computer Science, 483–84. Springer-Verlag.

-
- -
- -

Er, Yeşim, and Ayten Atasoy. 2016. “The Classification of White Wine and Red Wine According to Their Physicochemical Qualities.” International Journal of Intelligent Systems and Applications in Engineering, 23–26.

-
- -
- -

Harris, Charles R., K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. “Array Programming with NumPy.” Nature 585 (7825): 357–62. https://doi.org/10.1038/s41586-020-2649-2.

-
- -
- -

Hunter, J. D. 2007. “Matplotlib: A 2d Graphics Environment.” Computing in Science & Engineering 9 (3): 90–95. https://doi.org/10.1109/MCSE.2007.55.

-
- -
- -

Keleshev, Vladimir. 2014. Docopt: Command-Line Interface Description Language. https://github.com/docopt/docopt.

-
- -
- -

McKinney, Wes, and others. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, 445:51–56. Austin, TX.

-
- -
- -

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

-
- -
- -

Pérez, Fernando, and Brian E. Granger. 2007. “IPython: A System for Interactive Scientific Computing.” Computing in Science and Engineering 9 (3): 21–29. https://doi.org/10.1109/MCSE.2007.53.

-
- -
- -

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

-
- -
- -

Satyanarayan, Arvind, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. “Vega-Lite: A Grammar of Interactive Graphics.” IEEE Transactions on Visualization and Computer Graphics 23 (1): 341–50.

-
- -
- -

Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.

-
- -
- -

VanderPlas, Jacob, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. “Altair: Interactive Statistical Visualizations for Python.” Journal of Open Source Software 3 (32): 1057. https://doi.org/10.21105/joss.01057.

-
- -
- -

Virtanen, Pauli, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, et al. 2020. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.” Nature Methods 17: 261–72. https://doi.org/10.1038/s41592-019-0686-2.

-
- -
- -

Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

-
- -
- -

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

-
- -
- -

Zhu, Hao. 2020. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

-
- -
- - - diff --git a/reports/reports.md b/reports/reports.md index f071c02..b0e2f4d 100644 --- a/reports/reports.md +++ b/reports/reports.md @@ -1,15 +1,15 @@ Predicting wine quality using measurements of physiochemical tests ================ Alex Truong, Bruhat Musinuru, Rui Wang and Sang Yoon Lee
-2020-11-26 (updated: 2020-12-11) +2020-11-26 (updated: 2020-12-12) -- [Summary](#summary) -- [Introduction](#introduction) -- [Methods](#methods) - - [Data](#data) - - [Analysis](#analysis) -- [Results & Discussion](#results-discussion) -- [References](#references) + - [Summary](#summary) + - [Introduction](#introduction) + - [Methods](#methods) + - [Data](#data) + - [Analysis](#analysis) + - [Results & Discussion](#results-discussion) + - [References](#references) ## Summary @@ -74,8 +74,11 @@ observations of white wine.
Figure 1: Distribution of type of wine +

+ Figure 1: Distribution of type of wine +

@@ -83,10 +86,10 @@ Figure 1: Distribution of type of wine ### Analysis At the preprocessing stage, we decided to combine the red and white data -set as well as group the data in bigger classification, namely “poor,” -“normal” and “excellent” for scale “1-4,” “5-6” and “7-9” so as to have -bigger sample size (as per Figure 2). We acknowledge that the data is -imbalanced, hence instead of only using accuracy based to judge the +set as well as group the data in bigger classification, namely “poor”, +“normal” and “excellent” for scale “1-4”, “5-6” and “7-9” so as to +have bigger sample size (as per Figure 2). We acknowledge that the data +is imbalanced, hence instead of only using accuracy based to judge the model performance, we also include f1-score and use it as our main assessment metric. f-1 score is metric that combine both the precision and recall metrics, which focus on the false negative and false positive @@ -96,8 +99,11 @@ set.{Bruhat: to add more justification for f-1 micro score}
Figure 2: Regrouping of wine quality classification +

+ Figure 2: Regrouping of wine quality classification +

@@ -130,10 +136,13 @@ because we think it would generalize better.
-Table 1: Score results among different machine learning model we have explore +Figure 3: Score results among different machine learning model we have explore +

-Table 1: Score results among different machine learning model we have + +Figure 3: Score results among different machine learning model we have explore +

@@ -156,18 +165,20 @@ Looking at the distribution plot of the respective wine quality group interacting with each explanatory features, we can see that higher quality wine seems to be more associated with higher `alcohol` level and lower `density`. Lower `volatile acidity` also seems to be indicative of -better wine. Better ranked wine also seem to have -`higher free sulfur dioxide` level than poor wine though the -relationship is not that clear based on the plot. The rest of the -features do not seems be very distinguishable among different quality -wine. +better wine. Better ranked wine also seem to have `higher free sulfur +dioxide` level than poor wine though the relationship is not that clear +based on the plot. The rest of the features do not seems be very +distinguishable among different quality wine.
-Figure 3: Distribution plot between wine quality and various attributes from physicochemical test +Figure 4: Distribution plot between wine quality and various attributes from physicochemical test +

-Figure 3: Distribution plot between wine quality and various attributes + +Figure 4: Distribution plot between wine quality and various attributes from physicochemical test +

@@ -181,10 +192,13 @@ that it is pretty consistent which was not the case with many models.
-Figure 4: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) modelFigure 4: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model +Figure 5: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) modelFigure 5: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model +

-Figure 4: Accuracy scores and f1 scores across cross validation splits + +Figure 5: Accuracy scores and f1 scores across cross validation splits for neutral network Multi-layer Perception (MLP) model +

@@ -198,9 +212,12 @@ the high end of spectrum.
-Figure 5: Confusion Matrix +Figure 6: Confusion Matrix +

-Figure 5: Confusion Matrix + +Figure 6: Confusion Matrix +

@@ -219,17 +236,16 @@ serve well as an effective first-cut assessment on wine quality. # References -
+
-
+
-Brugman, Simon. 2019. “pandas-profiling: -Exploratory Data Analysis for Python.” -. +Brugman, Simon. 2019. “pandas-profiling: Exploratory Data Analysis for +Python.” .
-
+
Cortez, Paulo, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. “Modeling Wine Preferences by Data Mining from @@ -237,14 +253,14 @@ Physicochemical Properties.” *Decision Support Systems* 47 (4): 547–53.
-
+
de Jonge, Edwin. 2018. *Docopt: Command-Line Interface Specification Language*. .
-
+
Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer @@ -252,7 +268,7 @@ Sciences. .
-
+
Ellson, John, Emden Gansner, Lefteris Koutsofios, Stephen North, Gordon Woodhull, Short Description, and Lucent Technologies. 2001. “Graphviz - @@ -261,7 +277,7 @@ Science*, 483–84. Springer-Verlag.
-
+
Er, Yeşim, and Ayten Atasoy. 2016. “The Classification of White Wine and Red Wine According to Their Physicochemical Qualities.” *International @@ -269,7 +285,7 @@ Journal of Intelligent Systems and Applications in Engineering*, 23–26.
-
+
Harris, Charles R., K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. @@ -278,22 +294,22 @@ Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020.
-
+
-Hunter, J. D. 2007. “Matplotlib: A 2d Graphics Environment.” *Computing +Hunter, J. D. 2007. “Matplotlib: A 2D Graphics Environment.” *Computing in Science & Engineering* 9 (3): 90–95. .
-
+
Keleshev, Vladimir. 2014. *Docopt: Command-Line Interface Description Language*. .
-
+
McKinney, Wes, and others. 2010. “Data Structures for Statistical Computing in Python.” In *Proceedings of the 9th Python in Science @@ -301,7 +317,7 @@ Conference*, 445:51–56. Austin, TX.
-
+
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in @@ -309,7 +325,7 @@ Python.” *Journal of Machine Learning Research* 12: 2825–30.
-
+
Pérez, Fernando, and Brian E. Granger. 2007. “IPython: A System for Interactive Scientific Computing.” *Computing in Science and @@ -317,7 +333,7 @@ Engineering* 9 (3): 21–29. .
-
+
R Core Team. 2019. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. @@ -325,7 +341,7 @@ Computing*. Vienna, Austria: R Foundation for Statistical Computing.
-
+
Satyanarayan, Arvind, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. “Vega-Lite: A Grammar of Interactive Graphics.” *IEEE @@ -333,14 +349,7 @@ Transactions on Visualization and Computer Graphics* 23 (1): 341–50.
-
- -Van Rossum, Guido, and Fred L. Drake. 2009. *Python 3 Reference Manual*. -Scotts Valley, CA: CreateSpace. - -
- -
+
VanderPlas, Jacob, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben @@ -350,24 +359,30 @@ Visualizations for Python.” *Journal of Open Source Software* 3 (32):
-
+
+ +Van Rossum, Guido, and Fred L. Drake. 2009. *Python 3 Reference Manual*. +Scotts Valley, CA: CreateSpace. + +
+ +
Virtanen, Pauli, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler -Reddy, David Cournapeau, Evgeni Burovski, et al. 2020. “SciPy 1.0: Fundamental Algorithms for Scientific -Computing in Python.” *Nature Methods* 17: 261–72. -. +Reddy, David Cournapeau, Evgeni Burovski, et al. 2020. “SciPy 1.0: +Fundamental Algorithms for Scientific Computing in Python.” *Nature +Methods* 17: 261–72. .
-
+
Wickham, Hadley. 2017. *Tidyverse: Easily Install and Load the ’Tidyverse’*. .
-
+
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In *Implementing Reproducible Computational Research*, edited by @@ -376,9 +391,9 @@ Hall/CRC. .
-
+
-Zhu, Hao. 2020. *kableExtra: Construct Complex Table with ’Kable’ and +Zhu, Hao. 2020. *KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax*. .