In this activity, you are asked to examine data sources and produce a plot that illustrates a useful observation about the data.
You will examine the following data sources:
-
A dataset containing the prices and other attributes of almost 54,000 diamonds. Since the data are loaded from an R library, so you may use help documentation in RStudio to learn more about the data once you call the relevant library.
-
Nutrition data on 80 cereal products. Information about these data can be found on Kaggle (https://www.kaggle.com/crawford/80-cereals/data). Note that the data are NOT loaded from an R package, so RStudio help documentation will not be useful.
Each plot must be reproduced using the ggplot2
syntax. In order to accomplish this task, you may use an interactive graph builder functions as presented in class. Modify the graph builder tool until you are satisfied that you have produced an informative graphic, and then capture the ggplot2
syntax.
Note: the interactive graphics commands from themosaic
package like mplot()
or from esquisse
package like esquisser()
may be useful in your analysis. These functions can help you generate the code, but mplot()
and esquisser()
themselves should not be included in your .Rmd file.
Assignment is worth a total of 10 points:
-
Exploration of
diamonds
data- [2 point] Write a few sentences that describe the data (e,g., define cases & relevant variables shown in your graphic. Written narrative describing something learned/observed about the data using your graphic.
- [1 point]
ggplot2
expression that successfully produces a graphical display of the data. - [1 point] graphic must display at least 3 variables from the data.
-
Exploration of
cereal.csv
data- [2 point] Write a few sentences that describe the data (e,g., define cases & relevant variables shown in your graphic. Written narrative describing something learned/observed about the data using your graphic.
- [1 point]
ggplot2
expression that successfully produces a graphical display of the data. - [1 point] graphic must display at least 3 variables from the data.
-
[2 points] programming style (e.g., R code formatting) for each exploration is consistent. The easiest style to pick is the one presented in DataComputing eBook https://dtkaplan.github.io/DataComputingEbook/.