-
-
Notifications
You must be signed in to change notification settings - Fork 7
Diffing Options
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
- Filtering out irrelevant nodes and attributes
- Matching up nodes and attributes for comparison
- Comparing matched up nodes and attributes
To make it easier to configure the diffing engine, the library comes with a DiffBuilder
class, which handles the relatively complex task of setting up the HtmlDifferenceEngine
.
The following section documents the current built-in strategies that are available.
Contents:
- Default options
- Filter strategies
- Matching strategies
- Compare strategies
To learn how to create your own strategies, visit the Creating Custom Diffing Strategies page.
In most cases, calling DiffBuilder.Compare(...).WithTest(...).Build()
will give you a good set of default options for comparison, e.g.
var controlHtml = "<p>Hello World</p>";
var testHtml = "<p>World, I say hello</p>";
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.Build();
If you want to be more explicit, the following is equivalent to the code above:
var controlHtml = "<p>Hello World</p>";
var testHtml = "<p>World, I say hello</p>";
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions((IDiffingStrategyCollection options) => options.AddDefaultOptions())
.Build();
Calling the AddDefaultOptions()
method is equivalent to specifying the following options explicitly:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions((IDiffingStrategyCollection options) => options
.IgnoreDiffAttributes()
.IgnoreComments()
.AddSearchingNodeMatcher()
.AddCssSelectorMatcher()
.AddAttributeNameMatcher()
.AddElementComparer()
.AddIgnoreElementSupport()
.AddStyleSheetComparer()
.AddTextComparer(WhitespaceOption.Normalize, ignoreCase: false)
.AddAttributeComparer()
.AddClassAttributeComparer()
.AddBooleanAttributeComparer(BooleanAttributeComparision.Strict)
.AddStyleAttributeComparer()
)
.Build();
Read more about each of the strategies below, including some that are not part of the default setting.
These are the built-in filter strategies.
Enabling this strategy will ignore all comment nodes during comparison. Activate by calling the IgnoreComments()
method on a IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.IgnoreComments())
.Build();
NOTE: Currently, the ignore comment strategy does NOT remove comments from CSS or JavaScript embedded in <style>
or <script>
tags.
If the diff:ignore="true"
attribute is used on a control element (="true"
implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
In this example, the <h1>
tag, it's attribute and children are considered the same as the element it is matched with:
<header>
<h1 class="heading-1" diff:ignore>Hello world</h1>
</header>
Activate this strategy by calling the AddIgnoreElementSupport()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddIgnoreElementSupport())
.Build();
If the diff:ignoreChildren="true"
attribute is used on a control element (="true"
implicit/optional), all their child nodes are skipped/ignored during comparison the control element is matched with.
In this example, the <h1>
tag, it's children are considered the same as the element it is matched with:
<header>
<h1 class="heading-1" diff:ignoreChildren>Hello world</h1>
</header>
Any attributes that start with diff:
are automatically filtered out before matching/comparing happens. E.g. diff:whitespace="..."
does not show up as a missing diff when added to a control element.
To enable this option, use the IgnoreDiffAttributes()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.IgnoreDiffAttributes())
.Build();
These are the built-in matching strategies. We have two different types, one for nodes and one for attributes.
These are the built-in node matching strategies. They cover elements, text nodes, comments, and other types that inherit from INode
.
The one-to-one node-matching strategy simply matches two node lists with each other, based on the index of each node. So, if you have two equal length control and test node lists, controlNodes[0]
will be matched with testNodes[0]
, controlNodes[1]
with testNodes[1]
, and so on.
If either of the lists is shorter than the other, the remaining items will be reported as missing (for control nodes) or unexpected (for test nodes).
If a node has been marked as matched by a previously executed matcher, the One-to-one matcher will not use that node in its matching and skip over it.
To choose this matcher, use the AddOneToOneNodeMatcher()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddOneToOneNodeMatcher())
.Build();
The forward-searching node-matcher strategy will only match control nodes with test nodes if their NodeName
match. It does this by taking one control node at the time, and searching after the previously matched test node until it finds a match. If it does not, continues with the next control node, and the unmatched control node is marked as missing. After, any unmatched test nodes are marked as unexpected.
The follow JavaScript-ish-code illustrates how the algorithm works:
forwardSearchingMatcher(controlNodes, testNodes) {
let matches = []
let lastMatchedTestNode = -1
foreach(controlNode in controlNodes) {
var index = lastMatchedTestNode + 1
while(index < testNodes.length) {
if(controlNode.NodeName == testNodes[index].NodeName) {
matches.push((controlNode, testNodes[index]))
lastMatchedTestNode = index
index = testNodes.length
}
index++
}
}
return matches
}
To choose this matcher, use the AddSearchingNodeMatcher()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddSearchingNodeMatcher())
.Build();
The CSS-selector matcher can be used to match any test element from the test node tree with a given control element. On the control element, add the diff:match="CSS selector"
attribute. The specified CSS selector
should only match a zero or one test element.
For example, if the test nodes looks like this:
<header>
<h1>hello world</h1>
</header>
<main>
...
</main>
<footer>
...
</footer>
The following control node will be compared against the <h1>
in the <header>
tag:
<h1 diff:match="header > h1">hello world</h1>
One use case of the CSS-selector element matcher is where you only want to test one part of a sub-tree, and ignore the rest. The example above will report the unmatched test nodes as unexpected, but those "diffs" can be ignored since that is expected. This approach can save you from specifying all the needed control nodes if only part of a subtree needs to be compared.
To choose this matcher, use the AddCssSelectorMatcher()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddCssSelectorMatcher())
.Build();
These are the built-in attribute matching strategies.
This selector will match attributes on a control element with attributes on a test element using the attribute's name. If a control attribute is not matched, it is reported as missing and if a test attribute is not matched, it is reported as unexpected.
To choose this matcher, use the AddAttributeNameMatcher()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddAttributeNameMatcher())
.Build();
These are the built-in comparing strategies.
The basic element compare strategy will simply check if both nodes are elements and the element's name are the same.
To choose this comparer, use the AddElementComparer()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddElementComparer())
.Build();
The basic comment compare strategy will simply check if both nodes are comments.
To choose this comparer, use the AddCommentComparer()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddCommentComparer())
.Build();
The built-in text strategies offer a bunch of ways to control how text (text nodes) is handled during the diffing process.
NOTE: It is on the issues list to enable a more intelligent, e.g. whitespace-aware, comparison of JavaScript (text) inside <script>
-tags and event-attributes.
Whitespace can be a source of false positives when comparing two HTML fragments. Thus, the whitespace handling strategy offers different ways to deal with it during a comparison.
-
Preserve
(default): Does not change or filter out any whitespace in text nodes the control and test HTML. -
RemoveWhitespaceNodes
: Using this option filters out all text nodes that only consist of whitespace characters. -
Normalize
: Using this option will trim all text nodes and replace two or more whitespace characters with a single space character. This option implicitly includes theRemoveWhitespaceNodes
option.
These options can be set either globally for the entire comparison, or inline on a specific subtrees in the comparison.
To set a global default, call the method AddTextComparer(WhitespaceOption)
on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddTextComparer(WhitespaceOption.Normalize))
.Build();
To configure/override whitespace rules on a specific subtree in the comparison, use the diff:whitespace="WhitespaceOption"
inline on a control element, and it and all text nodes below it will use that whitespace option, unless it is overridden on a child element. In the example below, all whitespace inside the <h1>
element is preserved:
<header>
<h1 diff:whitespace="preserve">Hello <em> woooorld</em></h1>
</header>
Special case for <pre>
, <script>
, and <style>
elements: The content of <pre>
, <script>
, and <style>
elements will always be treated as the Preserve
option, even if whitespace option is globally set to RemoveWhitespaceNodes
or Normalize
. To override this, add a in-line diff:whitespace
attribute to the tags, e.g.:
<pre diff:whitespace="RemoveWhitespaceNodes">...</pre>
This should ensure that the meaning of the content in those tags doesn't change by default. To deal correctly with whitespace in <style>
tags, use the Style sheet text comparer.
To compare the text in two text nodes to each other using a case-insensitive comparison, call the AddTextComparer(ignoreCase: true)
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddTextComparer(ignoreCase: true))
.Build();
To configure/override ignore case rules on a specific subtree in the comparison, use the diff:ignoreCase="true|false"
inline on a control element, and it and all text nodes below it will use that ignore case setting, unless it is overridden on a child element. In the example below, ignore case is set active for all text inside the <h1>
element:
<header>
<h1 diff:ignoreCase="true">Hello <em> woooorld</em></h1>
</header>
Note, as with all HTML5 boolean attributes, the ="true"
or ="false"
parts are optional.
By using the inline attribute diff:regex
on the element containing the text node being compared, the comparer will consider the control text to be a regular expression, and will use that to test whether the test text node is as expected. This can be combined with the inline diff:ignoreCase
attribute, to make the regular expression case-insensitive. E.g.:
<header>
<h1 diff:regex diff:ignoreCase>Hello World \d{4}</h1>
</header>
The above control text would use a case-insensitive regular expression to match against a test text string (e.g. "HELLO WORLD 2020").
Different whitespace rules apply to style sheets (style information) inside <style>
tags, than to HTML5. This comparer will parse the style information inside <style>
tags and compare the result of the parsing, instead of doing a direct string comparison. This should remove false positives where e.g. insignificant whitespace makes two otherwise equal sets of style information result in a diff.
To add this comparer, use the AddStyleSheetComparer()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddStyleSheetComparer())
.Build();
The library supports various ways to perform attribute comparison.
The "name and value comparison" is the base comparison option, and that will test if both the names and the values of the control and test attributes are equal. E.g.:
-
attr="foo"
is the same asattr="foo"
-
attr="foo"
is the NOT same asattr="bar"
-
foo="attr"
is the NOT same asbar="attr"
To choose this comparer, use the AddAttributeComparer()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddAttributeComparer())
.Build();
It is possible to specify a regular expression in the control attributes value, and add the :regex
postfix to the control attributes name, to have the comparison performed using a Regex match test. E.g.
-
attr:regex="foo-\d{4}"
is the same asattr="foo-2019"
To get the comparer to perform a case insensitive comparison of the values of the control and test attribute, add the :ignoreCase
postfix to the control attributes name. E.g.
-
attr:ignoreCase="FOO"
is the same asattr="foo"
To perform a case insensitive regular expression match, combine :ignoreCase
and :regex
as a postfix to the control attributes name. The order you combine them does not matter. E.g.
-
attr:ignoreCase:regex="FOO-\d{4}"
is the same asattr="foo-2019"
-
attr:regex:ignoreCase="FOO-\d{4}"
is the same asattr="foo-2019"
The class attribute is special in HTML. It can contain a space-separated list of CSS classes, whose order does not matter. Therefore the library will ignore the order the CSS classes are specified in the class attribute of the control and test elements, and instead, just ensure that both have the same CSS classes added to it. E.g.
-
class="foo bar"
is the same asclass="bar foo"
To enable the special handling of the class attribute, call the AddClassAttributeComparer()
on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddClassAttributeComparer())
.Build();
Other special types of attributes are the boolean attributes. To make comparing these more forgiving, the boolean attribute comparer will consider two boolean attributes equal, according to these rules:
- In strict mode, a boolean attribute's value is considered truthy if the value is missing, empty, or is the name of the attribute.
- In loose mode, a boolean attribute's value is considered truthy if the attribute is present on an element.
For example, in strict mode, the following are considered equal:
-
required
is the same asrequired=""
-
required=""
is the same asrequired="required"
-
required="required"
is the same asrequired="required"
To enable the special handling of boolean attributes, call the AddBooleanAttributeComparer(BooleanAttributeComparision.Strict)
or AddBooleanAttributeComparer(BooleanAttributeComparision.Loose)
on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddBooleanAttributeComparer(BooleanAttributeComparision.Strict))
.Build();
Different whitespace rules apply to style information inside style="..."
attributes than to HTML5. This comparer will parse the style information inside style="..."
attributes and compare the result of the parsing, instead of doing a direct string comparison. This should remove false positives where e.g. insignificant whitespace makes two otherwise equal sets of style information result in a diff.
To add this comparer, use the AddStyleAttributeComparer()
method on the IDiffingStrategyCollection
type, e.g.:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddStyleAttributeComparer())
.Build();
When styles are parsed they are also normalized. This means that the following styles would be identical:
style="border: 1px solid red;"
style="border: solid 1px red;"
But if you have multiple styles the order matters and is therefore not changed. The following styles are different:
style="color: red; border: 0"
style="border: 0; color: red"
To add a style comparer where the order does not matter you can register the style comparer with the optional parameter ignoreOrder=true
:
var diffs = DiffBuilder
.Compare(controlHtml)
.WithTest(testHtml)
.WithOptions(options => options.AddStyleAttributeComparer(ignoreOrder: true))
.Build();
To ignore a specific attribute during comparison, add the :ignore
postfix to the attribute on the control element. Thus will simply skip comparing the two attributes and not report any differences between them. E.g. to ignore the class
attribute, do:
<header>
<h1 class:ignore>Hello world</h1>
</header>
To ignore all attributes during comparison, add the diff:ignoreAttributes
attribute on the control element. Thus will skip comparing all attributes and not report any differences between them. E.g. to ignore all attributes, do:
<header>
<h1 diff:ignoreAttributes>Hello world</h1>
</header>