Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better document downside of named elements in list_cbind() #1080

Open
rressler opened this issue May 22, 2023 · 4 comments
Open

Better document downside of named elements in list_cbind() #1080

rressler opened this issue May 22, 2023 · 4 comments

Comments

@rressler
Copy link

list_cbind calls vec_cbind which produces "packed data frame columns" with named inputs.

I am not familiar with packed data frame columns so was expecting to get a data frame with columns that were atomic vectors. The example from the list_cbind help (below) shows a different structure. This different structure also means the result of list_cbind displays differently in an R script than in a qmd/rmd code chunk .

x2 <- list(
  a = data.frame(x = 1:2),
  b = data.frame(y = "a")
)
list_cbind(x2) 
str(list_cbind(x2)) 
list_cbind(x2) |> unpack(cols = everything()) |> str()

Suggest adding additional information or suggestions in the help (or a vignette) on using unpack() how to convert the result of list_cbind into an unpacked data frame.

An alternative might be to add unpack to the list_cbind() as an argument to allow users to unpack as part of the function.

@hadley
Copy link
Member

hadley commented Jul 26, 2023

What are you trying to do? It's most likely that you should avoid the packing step in the first place.

@rressler
Copy link
Author

Thanks for responding (and everything else you do)!

I am not sure how to avoid the packing as

  1. list_cbind() combines elements into a data frame by column-binding them together with vctrs::vec_cbind() and then

  2. vec_cbind() creates packed data frame columns with named inputs.

I have not found a source that explains packed data frames or how to pass arguments to vec_cbind to unpack them as part of the functional call to list_cbind. Perhaps a vignette could be added to tidyr.

We can just pipe to flatten I guess but that seems to be an extra step that the newer functions such as pivot_wider etc try to avoid by using arguments.

Appreciate any recommendations.
Thanks,
Richard

@hadley
Copy link
Member

hadley commented Jul 27, 2023

Oooh sorry, I think I missed the underlying issue here. You're getting this behaviour because list_cbind() is attempting to preserve the internal and external names. You can get the behaviour you want by stripping the names:

library(purrr)

x2 <- list(
  a = data.frame(x = 1:2),
  b = data.frame(y = "a")
)
str(list_cbind(unname(x2)))
#> 'data.frame':    2 obs. of  2 variables:
#>  $ x: int  1 2
#>  $ y: chr  "a" "a"

Created on 2023-07-27 with reprex v2.0.2

I'll think about how to point this out in the docs.

@hadley hadley changed the title Request for additional help/vingette on list_cbind and packed columns Better document downside of named elements in list_cbind() Jul 27, 2023
@rressler
Copy link
Author

Thanks for the explanation and the solution approach!

p.s. FYI even Chat GPT can't explain packed data.frames.:)

What is a good reference to explain the "packed data.frame" in R?
ChatGPT
As of my last update in September 2021, there isn't a native data structure called "packed data.frame" in R. ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants