-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON to CSV Error #1534
Comments
Could you share a sample file and a sample command? |
given this sample file: if I run
even though each object element has slightly different fields, where I would expect a new csv header row each time. if I change the source file so that the nested array names differ from element to element of the outer array:
in version 6.10, this works as expected:
and
Also, if I use --j2p (instead of --j2c) in 6.12, it seems to work fine however. Thanks. |
Hi @osevill it's probably related to flatten/unflatten, but I do not have a solution for you. @johnkerl will be able to help you. |
@osevill the 6.11.0 release (https://github.com/johnkerl/miller/releases/tag/v6.11.0) contains PR #1479 which addresses issue #1418. Before this, Miller was in some cases producing non-compliant CSV output:
After this, Miller now produces compliant CSV output, or says that it can't:
If one row's list of column names is a strict subset of the others it can auto-unsparsify:
The concern raised by issue #1418, and addressed by PR #1479, is that Miller should stop producing "CSV" with non-compliant blank lines in it. @aborruso was right to request to #1418. For the data files in this issue, the records are truly non-homogeneous and are truly not representable as compliant CSV. Two options I can suggest:
|
I was looking for this, but each time I am unable to reconstruct it. Thank you @johnkerl |
@johnkerl The feedback above is great help, particularly Since my json data does have commas in the values, here's what I came up with... Given this sample file (this time with commas in the values): ...if I convert from json to tsvlite when doing the
Is there a simpler way to change the delimiter than calling mlr again and changing the output field separator? tsvlite doesn't seem to support changing the output field separator. Don't know your thoughts on this but would it be worthwhile to have a new file format that is "in between" csvlite and csv, in the sense that it would be csvlite + support for commas or newlines embedded in double quotes, but because it wouldn't adhere to the RFC4180 spec, it would allow for blank rows in the output? In this particular instance, it would assist me in accomplishing my json to row-based transform in just one mlr group-like. Thanks again for the feedback. |
Thanks! |
Wouldn't it be better to allow setting output separator for tsvlite instead? Sounds simpler, more predictable, flexible, and easier to explain to users, doesn't it? |
In previous versions
mlr --j2c group-like
worked fine.In version 6.12 the same script with the same json file gives an error:
mlr: CSV schema change: first keys "..."; current keys "..."
The fields/properties do change from one object element of the json array to the next, but that's why I use group-like.
Thanks
The text was updated successfully, but these errors were encountered: