Extend Coopy Highlighter Diff format with column type changes #3

danfowler · 2016-07-01T15:22:23Z

From @edwindj on February 25, 2015 14:10

A useful addition to the coopy highlighter diff format would be column type changes.

For example:

dataset a
A,B
1.1,1

and

dataset b
A,B,C
1,"1",2.1

The Coopy Diff is:

!,,+++
@@,A,B,C
->,1.1->1,1,2.1

A typed version of the format could be:

!,{number->integer},{integer->string},+++{number}
@@,A,B,C
->,1.1->1,1,2.1

In which the schema row can contain a column type change. IMHO type information is not obligatory, but should be interpreted by an implementation as a type suggestion, since types differs across programming languages. The types of json table schema seems like a good candidate for denoting common types.

Copied from original issue: frictionlessdata/datapackage#164

The text was updated successfully, but these errors were encountered:

danfowler · 2016-07-01T15:22:24Z

From @paulfitz on February 26, 2015 22:46

Thanks @edwindj. There's also work on refining the types in json table scheme in #159.

What do you think about leaving types in a separate optional row, like:

!,,,+++
@type,number->integer,integer->string,number
@@,A,B,C
->,1.1->1,1,2.1

I'm thinking that the spec could leave space for meta data associated with columns via a series of @foo lines (say @type, @precision, @special_stuff_for_R etc). Conforming consumers of diffs can ignore all that stuff, or try to use it. Conforming producers of diffs and add some of that stuff, or none of it.

The advantage of the separate rows is that the cells can behave exactly as in ordinary rows and be parsed in just the same way.

danfowler · 2016-07-01T15:22:24Z

From @paulfitz on February 26, 2015 23:9

Also, I understand from edwindj/daff#6 that you like having a single file for expressing diffs, and that may be the way to go. But just as the Tabular Data Package spec proposes data in csv and schema in json, there may be something to be said for expressing schema differences in a hierachical format like json rather than trying to flatten types out.

danfowler · 2016-07-01T15:22:25Z

From @edwindj on February 27, 2015 7:8

@paulfitz I like your the syntax for extra lines that may be ignored by consumers.
Maybe we can add this to the spec of Coopy Highligher Diff nonetheless.

Regarding type changes in one file or two: should we follow the diff paradigm of storing all changes in one text or should we follow the json table schema paradigm of describing meta data (changes) in a json file? The last option would force all users to use json table schema which I find too strict. May be we should support both with a preference for json table schema. When a schema is available it should be used, otherwise a less expressive form can be used with the @type syntax.

Note that a solution in the spirit of datapackage probably would not calculate a diff, but just reference two resources: table remote and table local.

danfowler · 2016-07-01T15:22:25Z

From @paulfitz on February 28, 2015 4:0

I agreed it would make sense to stick the new syntax in. I could take a shot also at adding support for it in daff. What I'd do is just ask the source of the tables if there's any meta-data, diff that, and pass it along. For patching, I'm not 100% clear what would happen, but basically daff should tell you what meta-data changes happened and let you take care of taking action based on them.

This feature should make diffs more useful within an environment with a single kind of data source, even if it wouldn't be very useful for interchange between different kinds of data sources.

danfowler · 2016-07-01T15:22:25Z

From @edwindj on March 1, 2015 9:59

Great! I will follow your changes and implement them in daff for R.

danfowler · 2016-07-01T15:22:26Z

From @rgrp on May 26, 2015 16:8

@paulfitz shoudl this remain open - are their pending changes? Otherwise let's close with summary.

danfowler · 2016-07-01T15:22:26Z

From @paulfitz on May 31, 2015 3:56

@rgrp can we keep it open a while longer? I've been plugging away on this, close to maturing.

danfowler · 2016-07-01T15:22:27Z

From @rgrp on May 31, 2015 8:4

@paulfitz fantastic!

danfowler · 2016-07-01T15:22:27Z

From @edwindj on May 31, 2015 10:3

@paulfitz Great!

danfowler · 2016-07-01T15:22:28Z

From @paulfitz on October 10, 2015 15:50

I implemented a version of this some time back, and then got distracted working on a demo for it with sqlite. Suppose we have a birds table as follows:

# schema: id INTEGER PRIMARY KEY, name TEXT, count TEXT
id,name,count
-------------
1,robin,251
2,eagle,10
3,pigeon,140

And we modify the type of a column, add another column, and add a row:

# schema: id INTEGER PRIMARY KEY, name TEXT, count INTEGER, weather TEXT
id,name,count,weather
---------------------
1,robin,251,warm
2,eagle,10,
3,pigeon,140,
4,penguin,5,cold

Then daff would report this diff:

To use this in R, you'd need to implement some code that reports the properties of each column that you care about. That is sufficient for diffing. For patching, you'd need to be able to accept a description of the changes in a particular format and make them happen. I'll need to document this better if you're still interested in pursuing this @edwindj.

danfowler · 2016-07-01T15:22:28Z

From @edwindj on October 11, 2015 11:56

@paulfitz, I'm still interested :-), documentation helps, but I will update my R code so this example works. Won't be until end of this week.

danfowler · 2016-07-01T15:22:29Z

From @rgrp on March 7, 2016 18:59

@edwindj @paulfitz can this be closed?

danfowler added the spec-coopy-diff label Jul 1, 2016

danfowler mentioned this issue Jul 1, 2016

Extend Coopy Highlighter Diff format with column type changes frictionlessdata/datapackage#164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Coopy Highlighter Diff format with column type changes #3

Extend Coopy Highlighter Diff format with column type changes #3

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

Extend Coopy Highlighter Diff format with column type changes #3

Extend Coopy Highlighter Diff format with column type changes #3

Comments

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016

danfowler commented Jul 1, 2016