Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# purrr (development version)

* Added a new keep_empty argument to list_c(), list_cbind() and list_rbind() which will keep empty elements as NA in the returned dataframe. (#1096)
hadley marked this conversation as resolved.
Show resolved Hide resolved

# purrr 1.0.2

* Fixed valgrind issue.
Expand Down
23 changes: 18 additions & 5 deletions R/list-combine.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#' same size (i.e. number of rows).
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
#' See [vctrs::vec_as_names()] for the meaning of these options.
#' @param keep_empty An optional Logical to keep empty elements of a list as NA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try rewriting this using the formula "If FALSE (the default), then ...; if TRUE, then ...`.

#' @inheritParams rlang::args_dots_empty
#' @export
#' @examples
Expand All @@ -30,17 +31,21 @@
#'
#' x2 <- list(
#' a = data.frame(x = 1:2),
#' b = data.frame(y = "a")
#' b = data.frame(y = "a"),
#' c = data.frame(z = NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent to just data.frame()?

#' )
#' list_rbind(x2)
#' list_rbind(x2, names_to = "id")
#' list_rbind(x2, names_to = "id", keep_empty = TRUE)
#' list_rbind(unname(x2), names_to = "id")
#'
#' list_cbind(x2)
list_c <- function(x, ..., ptype = NULL) {
#'
list_c <- function(x, ..., ptype = NULL, keep_empty = FALSE) {
vec_check_list(x)
check_dots_empty()

if(keep_empty) x <- convert_empty_element_to_NA(x)

# For `list_c()`, we don't expose `list_unchop()`'s `name_spec` arg,
# and instead strip outer names to avoid collisions with inner names
x <- unname(x)
Expand All @@ -58,19 +63,22 @@ list_cbind <- function(
x,
...,
name_repair = c("unique", "universal", "check_unique"),
size = NULL
size = NULL,
keep_empty = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavisVaughan my intuition is that keep_empty doesn't make sense for list_cbind(), but I don't have anything concrete to back that up. Do you have any thoughts?

) {
check_list_of_data_frames(x)
check_dots_empty()
if(keep_empty) x <- convert_empty_element_to_NA(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this applies to list_cbind()? I don't know why but I think it shouldn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's unclear what type the empty column should have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also weird that for list_cbind() the name of empty columns comes from the outer name rather than the inner name.


vec_cbind(!!!x, .name_repair = name_repair, .size = size, .error_call = current_env())
}

#' @export
#' @rdname list_c
list_rbind <- function(x, ..., names_to = rlang::zap(), ptype = NULL) {
list_rbind <- function(x, ..., names_to = rlang::zap(), ptype = NULL, keep_empty = FALSE) {
check_list_of_data_frames(x)
check_dots_empty()
if(keep_empty) x <- convert_empty_element_to_NA(x)

vec_rbind(!!!x, .names_to = names_to, .ptype = ptype, .error_call = current_env())
}
Expand All @@ -95,3 +103,8 @@ check_list_of_data_frames <- function(x, error_call = caller_env()) {
call = error_call
)
}

## used to convert empty elements into NA for list_binding functions
convert_empty_element_to_NA = function(x) {
map(x, \(x) if(vctrs::vec_is_empty(x)) NA else x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this anonymous function syntax requires R 4.1? Otherwise, this implementation looks good to me!

}
15 changes: 10 additions & 5 deletions man/list_c.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/test-list-combine.R
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,15 @@ test_that("NULLs are ignored", {
expect_equal(list_cbind(list(df1, NULL, df2)), vec_cbind(df1, df2))
})

test_that("NULLs are converted to NA when keep_empty = TRUE", {
df1 <- data.frame(x = 1)
df2 <- data.frame(y = 1)

expect_equal(list_c(list(1, NULL, 2), keep_empty = TRUE), c(1, NA, 2))
expect_equal(list_rbind(list(df1, NULL, df1), keep_empty = TRUE), vec_rbind(df1, NA, df1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd find it a little easier to verify that this is correct if the expected value looked like data.frame(x = c(1, NA, 1)

expect_equal(list_cbind(list(df1, z = NULL, df2), keep_empty = TRUE), vec_cbind(df1, z = NA, df2))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to implement this then I'd like the chance to come through and write a lot of tests for the 3 cases individually. The keep_empty logic in tidyr is quite hard to get 100% right, and required a lot of edge case tests.


test_that("empty inputs return expected output", {
expect_equal(list_c(list()), NULL)
expect_equal(list_c(list(NULL)), NULL)
Expand Down