Add an edge case test that `.` matches \\u2029 and \\u2028 #35

f3ath · 2023-04-26T03:58:09Z

I-Regexp follows the XSD-2 which states that the equivalent character class for . is [^\r\n]. Some programming languages (e.g. Javascript, Dart) treat . differently, in particular it won't match Unicode chars \u2029 and \u2028. This PR introduces a corresponding edge case test.

glyn · 2023-04-26T13:56:19Z

cts.json

@@ -3454,6 +3454,36 @@
        "a𐄁b"
      ]
    },
+    {
+      "name": "functions, match, dot matcher on \\u2028",


Would it be possible to use \u2028 in the JSON document?

Not sure I understand the question. Do you mean removing the second \? That would make the test name indistinguishable from the other one.

No, I was talking about using \u2028 in the "document" member.

In the document it is defined as \u2028, you can see it in the source file. But it gets replaced with the actual character when the cts.json get compiled. I'm not sure if it would be possible to keep it as \uXXXX in the compiled cts.json.

I don't think it's terribly important to have the character escaped in the doc. The only problem I could see is potentially a particularly strict JSON parser might not be able to read it.

gregsdennis · 2024-05-08T21:51:01Z

My implementation (which uses the .Net regex engine in an "ECMAScript" configuration) is returning the \r and \n as well.

Name:     functions, match, dot matcher on \u2028
Selector: $[?match(@, '.')]
Document: ["\u2028","\r","\n",true,[],{}]
Result:   ["\u2028"]
Results:   null
IsValid:  True

Actual (values): ["\u2028","\r","\n"]

Actual (serialized):
{
  "Matches": [
    {
      "Value": "\u2028",
      "Location": "$[0]"
    },
    {
      "Value": "\r",
      "Location": "$[1]"
    },
    {
      "Value": "\n",
      "Location": "$[2]"
    }
  ],
  "Error": null
}

Probably related to this. I have code that does some translation, but I don't think I did the "little bit of lookahead assertion added to remove \r and \n" part.

gregsdennis

After some googling I figured out how to add the lookahead exclusions, and the tests pass for me now.

hiltontj · 2024-05-12T21:45:54Z

Was able to fix this to get things passing in serde_json_path again with hiltontj/serde_json_path#92. Thanks for surfacing this one @f3ath!

Add an edge case test that . matches \\u2029 and \\u2028

6878638

glyn reviewed Apr 26, 2023

View reviewed changes

gregsdennis approved these changes May 8, 2024

View reviewed changes

gregsdennis merged commit 7c8e9bc into jsonpath-standard:main May 8, 2024

gregsdennis mentioned this pull request May 8, 2024

Fix matching on unicode to ignore newlines json-everything/json-everything#725

Merged

jg-rp mentioned this pull request May 9, 2024

Map JS RegExp to I-Regexp jg-rp/json-p3#19

Merged

hiltontj mentioned this pull request May 9, 2024

Fix failures from latest compliance test suite hiltontj/serde_json_path#90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an edge case test that `.` matches \\u2029 and \\u2028 #35

Add an edge case test that `.` matches \\u2029 and \\u2028 #35

f3ath commented Apr 26, 2023

glyn Apr 26, 2023

f3ath Apr 29, 2023

glyn Apr 30, 2023

f3ath Aug 26, 2023 •

edited

Loading

gregsdennis May 8, 2024

gregsdennis commented May 8, 2024 •

edited

Loading

gregsdennis left a comment

hiltontj commented May 12, 2024

Add an edge case test that . matches \\u2029 and \\u2028 #35

Add an edge case test that . matches \\u2029 and \\u2028 #35

Conversation

f3ath commented Apr 26, 2023

glyn Apr 26, 2023

Choose a reason for hiding this comment

f3ath Apr 29, 2023

Choose a reason for hiding this comment

glyn Apr 30, 2023

Choose a reason for hiding this comment

f3ath Aug 26, 2023 • edited Loading

Choose a reason for hiding this comment

gregsdennis May 8, 2024

Choose a reason for hiding this comment

gregsdennis commented May 8, 2024 • edited Loading

gregsdennis left a comment

Choose a reason for hiding this comment

hiltontj commented May 12, 2024

Add an edge case test that `.` matches \\u2029 and \\u2028 #35

Add an edge case test that `.` matches \\u2029 and \\u2028 #35

f3ath Aug 26, 2023 •

edited

Loading

gregsdennis commented May 8, 2024 •

edited

Loading