-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] When write parquet as data.frame we lost information about class. #44524
Comments
We have some legacy code that isn't compatible with While I understand the intent to remove unnecessary attributes, the assumption that If the goal is to remove attributes for the default table type in R, we should remove the class only when it is As I mention in the description, to avoid class modification, extra workaround is required now. Unfortunately, the The example that the assumption was made is probably incorrect: > class(arrow::arrow_table(name = "1", mtcars)$to_data_frame())
[1] "tbl_df" "tbl" "data.frame"
> class(arrow::arrow_table(mtcars, name = "1")$to_data_frame())
[1] "tbl_df" "tbl" "data.frame" In this case, In my opinion, this bug is tiny and can be resolved just by remove Lines 25 to 31 in 7ef5437
|
It's not a bug - it was a deliberate design decision to have a default type of tibble - the discussion is in those linked PRs, though is spread a bit between multiple ones. Do you still have the same problems if you were to call |
I'm just pointing that if arrow default type is As I mention before, when we save It could be changed to tibble - remove the class attribute when the object is just tibble.
You can always convert Anyway, if you are not agreed that removing class information from not default type object is a bug, then we should close the issue. |
Describe the bug, including details regarding any error messages, version, and platform.
When we write the parquet using arrow, we lost information that the data frame is
data.frame
.When we read
iris.parquet
it's read as tbl by default.This bug was introduced in #34775
The class
data.frame
is removed in.serialize_arrow_r_metadata
function.https://github.com/apache/arrow/blame/7ef5437e23bd7d7571a0c7a7fc0c5d3634816802/r/R/metadata.R#L25
When the parquet is saved, the attributes were removed.
Workaround:
data.frame
Bug description:
When we remove
data.frame
class attribute, we read parquet by default as tibble. In my opinion it's not expected behavior as when we writedata.frame
we should readdata.frame
.What was the reason for
remove the class if it's just data.frame
?Component(s)
R
The text was updated successfully, but these errors were encountered: