diff --git a/articles/plotting.html b/articles/plotting.html index 45d6a2f..f6c385a 100644 --- a/articles/plotting.html +++ b/articles/plotting.html @@ -98,13 +98,30 @@

Segregation curveDuncan and Duncan (1955). The function segcurve() provides a simple way -of plotting a segregation curve:

+of plotting one or several segregation curves:

-segcurve(subset(schools00, race %in% c("white", "black")),
+segcurve(subset(schools00, race %in% c("white", "asian")),
   "race", "school",
-  weight = "n"
+  weight = "n",
+  segment = "state" # leave this out to produce a single curve
 )

+

In this case, state A is the most segregated, while +state B and C are similarly segregated, but at +a lower level. Segregation curves are closely related to the index of +dissimilarity, and here this corresponds to the following index +values:

+
+# converting to data.table makes this easier
+data.table::as.data.table(schools00)[
+  race %in% c("white", "asian"),
+  dissimilarity(.SD, "race", "school", weight = "n"),
+  by = .(state)
+]
+#>    state stat       est
+#> 1:     A    D 0.6558592
+#> 2:     B    D 0.4002980
+#> 3:     C    D 0.3886178

Segplot @@ -129,24 +146,24 @@

Segplotaxis_labels.

Examples of how to use these arguments are given below:

-
+
 sch <- subset(schools00, state == "A")
 
 # basic segplot
 segplot(sch, "race", "school", weight = "n", axis_labels = "both")
-

-
+

+
 
 # order by majority group (white in this case)
 segplot(sch, "race", "school", weight = "n", order = "majority")
-

-
+

+
 
 # increase the space between bars
 # (has to be very low here because there are many schools in this dataset)
 segplot(sch, "race", "school", weight = "n", bar_space = 0.0005)
-

-
+

+
 
 # change the reference distribution
 # (here, we just use an equalized distribution across the five groups)
@@ -161,7 +178,7 @@ 

Segplot weight = "n", reference_distribution = ref )

-

+

Compressing segregation information @@ -187,7 +204,7 @@

Compressing segregation information

The second step is then to run the actual compression algorithm using compress(). For this example, we choose to compress based on a relatively small window:

-
+
 # compression based on window of 20 'neighboring' units
 # in terms of local segregation (alternatively, neighbors can be a data frame)
 comp <- compress(sch, "race", "school",
@@ -196,7 +213,7 @@ 

Compressing segregation information

After running compress()—which can take some time depending on how many neighbors need to be considered—the output summarizes the compression that can be achieved:

-
+
 comp
 #> Compression of dataset with 560 units
 #> Original M: 0.4085965; Final M: 0
@@ -210,12 +227,12 @@ 

Compressing segregation information through comp$iterations. This data frame can also be used to generate a plot that shows the relationship between the number of merges and the loss in segregation information:

-
+
 scree_plot(comp)
-

+

Another way to learn more about the compression is to visualize the information as a dendrogram:

-
+
 dend <- as.dendrogram(comp)
 dendextend::labels(dend) <- NULL # remove the labels
 #> Warning in `labels<-.dendrogram`(`*tmp*`, value = NULL): The lengths of the new
@@ -224,13 +241,13 @@ 

Compressing segregation information #> Warning in rep(new_labels, length.out = leaves_length): 'x' is NULL so the #> result will be NULL plot(dend)

-

+

The third step is to create a new dataset based on the desired level of compression. This can be achieved using the function merge_units(), and either n_units or percent can be specified to indicate the desired level of compression.

-
+
 sch_compressed <- merge_units(comp, n_units = 15)
 # or, for instance: merge_units(comp, percent = 0.80)
 head(sch_compressed)
@@ -243,9 +260,9 @@ 

Compressing segregation information #> 6: M2 hisp 642

The compressed dataset has the same format as the original dataset and can now be used to produce another segplot, e.g.

-
+
 segplot(sch_compressed, "race", "school", weight = "n")
-

+

diff --git a/articles/plotting_files/figure-html/unnamed-chunk-10-1.png b/articles/plotting_files/figure-html/unnamed-chunk-10-1.png new file mode 100644 index 0000000..f5dc90e Binary files /dev/null and b/articles/plotting_files/figure-html/unnamed-chunk-10-1.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-2-1.png b/articles/plotting_files/figure-html/unnamed-chunk-2-1.png index 943fdf7..8f41812 100644 Binary files a/articles/plotting_files/figure-html/unnamed-chunk-2-1.png and b/articles/plotting_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-4-1.png b/articles/plotting_files/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..d59549c Binary files /dev/null and b/articles/plotting_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-4-2.png b/articles/plotting_files/figure-html/unnamed-chunk-4-2.png new file mode 100644 index 0000000..f78d6b6 Binary files /dev/null and b/articles/plotting_files/figure-html/unnamed-chunk-4-2.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-4-3.png b/articles/plotting_files/figure-html/unnamed-chunk-4-3.png new file mode 100644 index 0000000..0d7b72f Binary files /dev/null and b/articles/plotting_files/figure-html/unnamed-chunk-4-3.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-4-4.png b/articles/plotting_files/figure-html/unnamed-chunk-4-4.png new file mode 100644 index 0000000..0cca0b7 Binary files /dev/null and b/articles/plotting_files/figure-html/unnamed-chunk-4-4.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-7-1.png b/articles/plotting_files/figure-html/unnamed-chunk-7-1.png index 3c95ff4..23c30f6 100644 Binary files a/articles/plotting_files/figure-html/unnamed-chunk-7-1.png and b/articles/plotting_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/articles/plotting_files/figure-html/unnamed-chunk-8-1.png b/articles/plotting_files/figure-html/unnamed-chunk-8-1.png index f5dc90e..3c95ff4 100644 Binary files a/articles/plotting_files/figure-html/unnamed-chunk-8-1.png and b/articles/plotting_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/articles/segregation.html b/articles/segregation.html index 2b69e50..211641b 100644 --- a/articles/segregation.html +++ b/articles/segregation.html @@ -249,8 +249,8 @@

Computing the M and H indices) #> 500 bootstrap iterations on 877739 observations #> stat est se CI bias -#> 1: M 0.4218888 0.0008052111 0.4202830,0.4234599 0.003650193 -#> 2: H 0.4152473 0.0007114470 0.4137465,0.4165963 0.003561007

+#> 1: M 0.4219383 0.0008178582 0.4203089,0.4236068 0.003600699 +#> 2: H 0.4152563 0.0007530694 0.4137359,0.4167181 0.003552063

As there a large number of observations, the standard errors are very small.

@@ -481,8 +481,8 @@

Inference)) #> 500 bootstrap iterations on 877739 observations #> stat est se CI bias -#> 1: M 0.4219476 0.0007579537 0.4205498,0.4234631 0.003591399 -#> 2: H 0.4152541 0.0006763022 0.4138784,0.4166825 0.003554205

+#> 1: M 0.4218735 0.0007364381 0.4205171,0.4232621 0.003665477 +#> 2: H 0.4152207 0.0006890576 0.4139019,0.4165112 0.003587629

The confidence intervals are based on the percentiles from the bootstrap distribution, and hence require a large number of bootstrap iterations for valid interpretation. The estimate est that @@ -500,10 +500,10 @@

Inference
 # M
 with(se, c(est[1] - 1.96 * se[1], est[1] + 1.96 * se[1]))
-#> [1] 0.4204620 0.4234332
+#> [1] 0.4204301 0.4233169
 # H
 with(se, c(est[2] - 1.96 * se[2], est[2] + 1.96 * se[2]))
-#> [1] 0.4139286 0.4165797

+#> [1] 0.4138701 0.4165712

provide effectively the same coverage as the confidence intervals obtained from the percentile bootstrap.

Whenever the bootstrap is used, the bootstrap distributions for each @@ -535,8 +535,8 @@

Inference
 mutual_expected(schools00, "race", "school", weight = "n", n_bootstrap = 500)
 #>         stat         est           se
-#> 1: M under 0 0.004808867 7.623260e-05
-#> 2: H under 0 0.004732807 7.502684e-05

+#> 1: M under 0 0.004806118 7.679837e-05 +#> 2: H under 0 0.004730100 7.558367e-05

Here, there is no concern about bias due to a small sample size.

This method also supports inference by setting se = TRUE.

diff --git a/articles/segregation_files/figure-html/unnamed-chunk-15-1.png b/articles/segregation_files/figure-html/unnamed-chunk-15-1.png index eaee46c..60b04bb 100644 Binary files a/articles/segregation_files/figure-html/unnamed-chunk-15-1.png and b/articles/segregation_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/articles/segregation_files/figure-html/unnamed-chunk-19-1.png b/articles/segregation_files/figure-html/unnamed-chunk-19-1.png index 4b739de..75ba345 100644 Binary files a/articles/segregation_files/figure-html/unnamed-chunk-19-1.png and b/articles/segregation_files/figure-html/unnamed-chunk-19-1.png differ diff --git a/news/index.html b/news/index.html index d7e1882..8a275f1 100644 --- a/news/index.html +++ b/news/index.html @@ -58,6 +58,7 @@

segregation (development version)

  • various improvements to compression algorithm
  • add dendrogram visualization
  • +
  • allow multiple curves in segcurve function

segregation 1.0.0

CRAN release: 2023-08-24

diff --git a/pkgdown.yml b/pkgdown.yml index 0be8039..f404504 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -5,7 +5,7 @@ articles: faq: faq.html plotting: plotting.html segregation: segregation.html -last_built: 2023-10-03T12:53Z +last_built: 2023-10-03T13:33Z urls: reference: https://elbersb.com/segregation/reference article: https://elbersb.com/segregation/articles diff --git a/reference/dissimilarity_expected.html b/reference/dissimilarity_expected.html index e644f4b..9993076 100644 --- a/reference/dissimilarity_expected.html +++ b/reference/dissimilarity_expected.html @@ -134,13 +134,13 @@

Examples n = c(rep(1, 10), rep(9, 10)) ) dissimilarity_expected(small, "race", "school", weight = "n") -#> stat est se -#> 1: D under 0 0.3788889 0.09618616 +#> stat est se +#> 1: D under 0 0.3755556 0.117949 # with an increase in sample size (n=1000), the values improve small$n <- small$n * 10 dissimilarity_expected(small, "race", "school", weight = "n") -#> stat est se -#> 1: D under 0 0.1232222 0.02899553 +#> stat est se +#> 1: D under 0 0.121 0.02762111