Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook file format that makes version control and merging easier #78

Open
riedelcastro opened this issue Aug 22, 2015 · 5 comments
Open

Comments

@riedelcastro
Copy link
Contributor

Notebooks are often collaboratively edited (e.g. the tutorial, or the book project). Any conflicting changes on notebooks are super difficult to resolve due to the json file format, and the fact that new lines in the raw text are escaped in json. So conflicts in the same cell (which can be quite large in my cases) are almost impossible to resolve easily.

Possible remedies: allow newlines in json (not sure if that's possible though), or use xml?

@sameersingh
Copy link
Member

Is there code for such a format for serializing simple case classes? If so,
it's all plug and play.

If not, we'll have to write our own serialization and deserialization that,
well, I don't wanna!

I think all relevant code is in Document.scala.
On Aug 22, 2015 3:04 PM, "Sebastian Riedel" [email protected]
wrote:

Notebooks are often collaboratively edited (e.g. the tutorial, or the book
project). Any conflicting changes on notebooks are super difficult to
resolve due to the json file format, and the fact that new lines in the raw
text are escaped in json. So conflicts in the same cell (which can be quite
large in my cases) are almost impossible to resolve easily.

Possible remedies: allow newlines in json (not sure if that's possible
though), or use xml?


Reply to this email directly or view it on GitHub
#78.

@riedelcastro
Copy link
Contributor Author

While I think we can reduce the number of necessary properties to be stored, such that own serialization should be super simple, json4s has XML support!

(my dream data format would still be text/markdown only, with scala cells markdown code blocks with some extra properties where needed, maybe specified through html comments)

@sameersingh
Copy link
Member

I can try out the json4s XML serialization, if that's good enough. XML is
not quite readable though, a better option might be to consider HOCON,
but we'll have to write our own converters for the objects, which will be a
pain.

I would be okay for a direct text/markdown format as well.

On Sat, Aug 22, 2015 at 4:10 PM, Sebastian Riedel [email protected]
wrote:

While I think we can reduce the number of necessary properties to be
stored, such that own serialization should be super simple, json4s has XML
support!

(my dream data format would still be text/markdown only, with scala cells
markdown code blocks with some extra properties where needed, maybe
specified through html comments)


Reply to this email directly or view it on GitHub
#78 (comment).

@riedelcastro
Copy link
Contributor Author

I don't think XML will be a problem here, because in most cases it's the cell content that will dominate the file. The only thing that could be improved is using xml attributes for cell attributes, as opposed to child elements, but that may not be possible with json4s.

HOCON would be great---is there an automatic serializer for this?

@sameersingh
Copy link
Member

No, no automated io for HOCON, but should be easier than from scratch.

On Sun, Aug 23, 2015 at 1:57 AM, Sebastian Riedel [email protected]
wrote:

I don't think XML will be a problem here, because in most cases it's the
cell content that will dominate the file. The only thing that could be
improved is using xml attributes for cell attributes, as opposed to child
elements, but that may not be possible with json4s.

HOCON would be great---is there an automatic serializer for this?


Reply to this email directly or view it on GitHub
#78 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants