Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# purrr (development version)

* Added a new keep_empty argument to `list_c()`, `list_cbind()`, and `list_rbind()` which will keep empty elements as `NA` in the returned data frame (@SokolovAnatoliy, #1096).

# purrr 1.0.2

* Fixed valgrind issue.
Expand Down
31 changes: 26 additions & 5 deletions R/list-combine.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
#' same size (i.e. number of rows).
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
#' See [vctrs::vec_as_names()] for the meaning of these options.
#' @param keep_empty An optional logical. If `FALSE` (the default), then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep_empty is the wrong name if double() isn't promoted to NA_real_ and kept. It's more like keep_null.

> list_c(list(1, double(), 2))
[1] 1 2
> list_c(list(1, double(), 2), keep_empty = T)
[1] 1 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare with

> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x)
# A tibble: 2 × 1
      x
  <dbl>
1     1
2     2
> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x, keep_empty = TRUE)
# A tibble: 3 × 1
      x
  <dbl>
1     1
2    NA
3     2

#' empty (`NULL`) elements are silently ignored; if `TRUE`, then empty
#' elements are preserved by converting to `NA`.
#' @inheritParams rlang::args_dots_empty
#' @export
#' @examples
Expand All @@ -30,16 +33,21 @@
#'
#' x2 <- list(
#' a = data.frame(x = 1:2),
#' b = data.frame(y = "a")
#' b = data.frame(y = "a"),
#' c = NULL
#' )
#' list_rbind(x2)
#' list_rbind(x2, names_to = "id")
#' list_rbind(x2, names_to = "id", keep_empty = TRUE)
#' list_rbind(unname(x2), names_to = "id")
#'
#' list_cbind(x2)
list_c <- function(x, ..., ptype = NULL) {
#' list_cbind(x2, keep_empty = TRUE)
list_c <- function(x, ..., ptype = NULL, keep_empty = FALSE) {
vec_check_list(x)
check_dots_empty()
if (keep_empty) {
Comment on lines +45 to +48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want to use check_bool(keep_empty) everywhere

x <- convert_null_to_NA(x)
}

# For `list_c()`, we don't expose `list_unchop()`'s `name_spec` arg,
# and instead strip outer names to avoid collisions with inner names
Expand All @@ -58,19 +66,26 @@ list_cbind <- function(
x,
...,
name_repair = c("unique", "universal", "check_unique"),
size = NULL
size = NULL,
keep_empty = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavisVaughan my intuition is that keep_empty doesn't make sense for list_cbind(), but I don't have anything concrete to back that up. Do you have any thoughts?

) {
check_list_of_data_frames(x)
check_dots_empty()
if (keep_empty) {
x <- convert_null_to_NA(x)
}

vec_cbind(!!!x, .name_repair = name_repair, .size = size, .error_call = current_env())
}

#' @export
#' @rdname list_c
list_rbind <- function(x, ..., names_to = rlang::zap(), ptype = NULL) {
list_rbind <- function(x, ..., names_to = rlang::zap(), ptype = NULL, keep_empty = FALSE) {
check_list_of_data_frames(x)
check_dots_empty()
if (keep_empty) {
x <- convert_null_to_NA(x)
}

vec_rbind(!!!x, .names_to = names_to, .ptype = ptype, .error_call = current_env())
}
Expand All @@ -95,3 +110,9 @@ check_list_of_data_frames <- function(x, error_call = caller_env()) {
call = error_call
)
}

convert_null_to_NA <- function(x) {
is_null <- map_lgl(x, is.null)
x[is_null] <- list(NA)
Comment on lines +115 to +116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vctrs::vec_detect_missing() would be much faster at detecting NULLs since we know x is a list.

See tidyr:::list_replace_null() for a robust and very fast version of this operation

x
}
17 changes: 12 additions & 5 deletions man/list_c.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions tests/testthat/test-list-combine.R
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,24 @@ test_that("NULLs are ignored", {
expect_equal(list_cbind(list(df1, NULL, df2)), vec_cbind(df1, df2))
})

test_that("NULLs are converted to NA when keep_empty = TRUE", {
df1 <- data.frame(x = 1)
df2 <- data.frame(y = 1)

expect_equal(
list_c(list(1, NULL, 2), keep_empty = TRUE),
c(1, NA, 2)
)
expect_equal(
list_rbind(list(df1, NULL, df1), keep_empty = TRUE),
data.frame(x = c(1, NA, 1))
)
expect_equal(
list_cbind(list(df1, z = NULL, df2), keep_empty = TRUE),
data.frame(df1, z = NA, df2)
)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to implement this then I'd like the chance to come through and write a lot of tests for the 3 cases individually. The keep_empty logic in tidyr is quite hard to get 100% right, and required a lot of edge case tests.


test_that("empty inputs return expected output", {
expect_equal(list_c(list()), NULL)
expect_equal(list_c(list(NULL)), NULL)
Expand Down