-
Notifications
You must be signed in to change notification settings - Fork 2
/
05-stat-summaries.Rmd
67 lines (39 loc) · 1.52 KB
/
05-stat-summaries.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Statistical Summaries
```{r, include=FALSE}
library(tidyverse)
```
## Exercises
**1.** What binwidth tells you the most interesting story about the distribution of `carat`?
```{r}
diamonds %>%
ggplot(aes(carat)) +
geom_histogram(binwidth = 0.2)
```
- Highly subjective answer, but I would go with 0.2 since it gives you the right amount of information about the distribution of `carat`: right-skewed.
<br>
**2.** Draw a histogram of `price`. What interesting patterns do you see?
```{r}
diamonds %>%
ggplot(aes(price)) +
geom_histogram(binwidth = 500)
```
- It's skewed to the right and has a long tail. Also, there is a small peak around 5000 and a huge peak around 0.
<br>
**3.** How does the distribution of `price` vary with `clarity`?
```{r}
diamonds %>%
ggplot(aes(clarity, price)) +
geom_boxplot()
```
- The range of prices is similar across clarity and the median and IQR vary greatly with clarity.
<br>
**4.** Overlay a frequency polygon and density plot of `depth`. What computed variable do you need to map to `y` to make the two plots comparable? (You can either modify `geom_freqpoly()` or `geom_density()`.)
```{r}
diamonds %>%
count(depth) %>%
mutate(sum = sum(n),
density = n / sum) %>%
ggplot(aes(depth, density)) +
geom_line()
```
- Say you start off with the count of values in `depth` and you plot `geom_freqpoly()`. Then, you would want to divide each count by the total number of points to get density. This would get you the y variable needed for `geom_density()`