Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Conversation

SokolovAnatoliy
Copy link

Fixes #1096.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

NEWS.md Outdated Show resolved Hide resolved
R/list-combine.R Outdated
@@ -22,6 +22,7 @@
#' same size (i.e. number of rows).
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
#' See [vctrs::vec_as_names()] for the meaning of these options.
#' @param keep_empty An optional Logical to keep empty elements of a list as NA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try rewriting this using the formula "If FALSE (the default), then ...; if TRUE, then ...`.

R/list-combine.R Outdated
@@ -30,17 +31,21 @@
#'
#' x2 <- list(
#' a = data.frame(x = 1:2),
#' b = data.frame(y = "a")
#' b = data.frame(y = "a"),
#' c = data.frame(z = NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent to just data.frame()?

R/list-combine.R Outdated
) {
check_list_of_data_frames(x)
check_dots_empty()
if(keep_empty) x <- convert_empty_element_to_NA(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this applies to list_cbind()? I don't know why but I think it shouldn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's unclear what type the empty column should have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also weird that for list_cbind() the name of empty columns comes from the outer name rather than the inner name.

R/list-combine.R Outdated

## used to convert empty elements into NA for list_binding functions
convert_empty_element_to_NA = function(x) {
map(x, \(x) if(vctrs::vec_is_empty(x)) NA else x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this anonymous function syntax requires R 4.1? Otherwise, this implementation looks good to me!

df2 <- data.frame(y = 1)

expect_equal(list_c(list(1, NULL, 2), keep_empty = TRUE), c(1, NA, 2))
expect_equal(list_rbind(list(df1, NULL, df1), keep_empty = TRUE), vec_rbind(df1, NA, df1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd find it a little easier to verify that this is correct if the expected value looked like data.frame(x = c(1, NA, 1)

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

R/list-combine.R Outdated Show resolved Hide resolved
R/list-combine.R Outdated
) {
check_list_of_data_frames(x)
check_dots_empty()
if(keep_empty) x <- convert_empty_element_to_NA(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's unclear what type the empty column should have.

NEWS.md Outdated Show resolved Hide resolved
R/list-combine.R Outdated Show resolved Hide resolved
@@ -58,19 +66,26 @@ list_cbind <- function(
x,
...,
name_repair = c("unique", "universal", "check_unique"),
size = NULL
size = NULL,
keep_empty = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavisVaughan my intuition is that keep_empty doesn't make sense for list_cbind(), but I don't have anything concrete to back that up. Do you have any thoughts?

Copy link
Member

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping in some initial comments while looking at this. No need to make any changes yet though.

But also see #1096 (comment). I think it is possible we made a small mistake by marking this as a TDD issue, since it can already be solved through tidyr::unchop() in a way that I feel is much more robust for the motivating example given there (i.e. the size invariants are nicer when you use unchop()).

@@ -22,6 +22,9 @@
#' same size (i.e. number of rows).
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
#' See [vctrs::vec_as_names()] for the meaning of these options.
#' @param keep_empty An optional logical. If `FALSE` (the default), then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep_empty is the wrong name if double() isn't promoted to NA_real_ and kept. It's more like keep_null.

> list_c(list(1, double(), 2))
[1] 1 2
> list_c(list(1, double(), 2), keep_empty = T)
[1] 1 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare with

> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x)
# A tibble: 2 × 1
      x
  <dbl>
1     1
2     2
> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x, keep_empty = TRUE)
# A tibble: 3 × 1
      x
  <dbl>
1     1
2    NA
3     2

Comment on lines +45 to +48
list_c <- function(x, ..., ptype = NULL, keep_empty = FALSE) {
vec_check_list(x)
check_dots_empty()
if (keep_empty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want to use check_bool(keep_empty) everywhere

Comment on lines +115 to +116
is_null <- map_lgl(x, is.null)
x[is_null] <- list(NA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vctrs::vec_detect_missing() would be much faster at detecting NULLs since we know x is a list.

See tidyr:::list_replace_null() for a robust and very fast version of this operation

list_cbind(list(df1, z = NULL, df2), keep_empty = TRUE),
data.frame(df1, z = NA, df2)
)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to implement this then I'd like the chance to come through and write a lot of tests for the 3 cases individually. The keep_empty logic in tidyr is quite hard to get 100% right, and required a lot of edge case tests.

@DavisVaughan
Copy link
Member

We have decided not to implement this feature for now since:

@SokolovAnatoliy we apologize about that! We hope you had fun at TDD anyways, and it seems like you got to do some other PRs in other repos, which is awesome! Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FR: New keep_empty = FALSE argument to list_c() and list_rbind()
5 participants