Skip to content

Structural Diffs

Wilfred Hughes edited this page Aug 26, 2024 · 12 revisions

https://www.monperrus.net/martin/tree-differencing also has a good overview of tools available in this space.

Tree Diffs

Most tree diff implementations focus on XML, and there's a great overview of techniques in this blog post. Daniel Ehrenberg also has an expanded 7-page PDF of his blog post, which I've mirrored here (the storage service says the PDF is under a "Attribution Non-Commercial (BY-NC)" license).

A Jane Street intern built a tree differ using an A* algorithm. This compares two s-expressions and builds a minimal new one with the new section marked with :date-switch.

(Jane Street also has patdiff, but that seems to be a line-oriented diff with some whitespace/integer display polish. It doesn't understand that e.g. whitespace in "foo " is meaningful).

JSON diff

json-diff provides a proper structural diff for JSON files.

graphtage

graphtage compares structured data by parsing into a generic file format, then displaying a diff. It finds the optimal edit sequence, and even allows things like diffing JSON against YAML.

Lisp diffs

sdiff and diff-sexp explore s-expression oriented diffs.

Autochrome is a structural diff for Clojure, using Dijkstra pathfinding. It does not track moves.

psydiff/ydiff

psydiff and ydiff apply structural diffing to Python and Lisp respectively, and output an HTML page of the result. They consider ASTs excluding comments.

diffsitter

diffsitter uses tree sitter for parsing, then runs longest-common-subsequence on the leaves of the tree.

Type Safe Generic Differencing

https://victorcmiraldo.github.io/data/MiraldoPhD.pdf (handles merging as well as diffing). Source referenced in the paper is https://github.com/VictorCMiraldo/hdiff/tree/v0.0.5 .

Victor has also published An Efficient Algorithm for Type-Safe Structural Diffing in ICFP 2019.

Concise, Type-Safe, and Efficient Structural Diffing

PLDI 2021: https://dl.acm.org/doi/10.1145/3453483.3454052

GumTree

Builds a tree structure and diffs it for several programming languages: https://github.com/GumTreeDiff/gumtree