-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update delpaths function to use sort with slicing #3040
base: master
Are you sure you want to change the base?
Conversation
Can someone give an example when this happens? also some test for it would be good |
@wader Well, I checked my 2023-08-29 chat logs with @nicowilliams and that issue was about the following [6,5,1,2,5,3,4,6,7,6,5] |
del(
. as $out |
range(length) as $i
.[$i+1:][] |
select(. == $dot[$i])
) Not working as intended because of its use of slices; output: $ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | del(. as $dot | range(length) as $i | .[$i+1:][] | select(. == $dot[$i]))'
[6,5,1,2,3,4,7,5]
$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | delpaths([path(. as $dot | range(length) as $i | .[$i+1:][] | select(. == $dot[$i])) | debug])'
["DEBUG:",[{"start":1,"end":null},6]]
["DEBUG:",[{"start":1,"end":null},8]]
["DEBUG:",[{"start":2,"end":null},2]]
["DEBUG:",[{"start":2,"end":null},8]]
["DEBUG:",[{"start":5,"end":null},5]]
["DEBUG:",[{"start":8,"end":null},1]]
[6,5,1,2,3,4,7,5] (The last duplicate This is the usual problem that afflicts path expressions with negative indices and slices; the slice paths do not get canonicalised (or they are not sorted in an order that would make the deletions work correctly), so they end up pointing the wrong elements. A version of that same script that uses a range to loop the array instead of a slice works correctly: [6,5,1,2,5,3,4,6,7,6,5] |
del(
. as $dot |
range(length) as $i |
.[range($i + 1; length)] |
select(. == $dot[$i])
) output: $ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | del(. as $dot | range(length) as $i | .[range($i + 1; length)] | select(. == $dot[$i]))'
[6,5,1,2,3,4,7]
$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | delpaths([ path(. as $dot | range(length) as $i | .[range($i + 1; length)] | select(. == $dot[$i])) | debug ])'
["DEBUG:",[7]]
["DEBUG:",[9]]
["DEBUG:",[4]]
["DEBUG:",[10]]
["DEBUG:",[10]]
["DEBUG:",[9]]
[6,5,1,2,3,4,7] I don't really understand this patch, so I am not sure what @capak07 had in mind with it. This does not solve the issue #2868 was created for, but maybe they are trying to solve a different bug? |
I don't understand what you mean |
paths = jv_sort(paths, jv_copy(paths)); | ||
jv_array_foreach(paths, i, elem) { | ||
int paths_length = jv_array_length(jv_copy(paths)); | ||
paths = jv_sort(jv_array_slice(paths, 0, paths_length), jv_copy(paths)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be the fix, right? I mean, the problem is that we need to canonicalize array indices in paths
first, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jv_array_slice(a, 0, length_of_a)
should be a no-op, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this patch is trying to do either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicowilliams
is this what you're suggesting?
jv canonicalize_indices(jv paths) {
int paths_length = jv_array_length(jv_copy(paths));
for (int i = 0; i < paths_length; i++) {
jv path = jv_array_get(jv_copy(paths), i);
int index = jv_number_value(path);
if (index < 0) {
index = paths_length + index;
}
if (index < 0 || index >= paths_length) {
jv_free(paths);
return jv_invalid_with_msg(jv_string("Invalid array index"));
}
paths = jv_array_set(paths, i, jv_number(index));
}
return paths;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To canonicalize the paths we need to refer to the tree they refer to. The idea is to take any negative indices in the paths and normalize them to positive indices, but to do this you need to know the length of the corresponding array in the tree for that negative index in the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (index < 0) index = paths_length + index;
? Why would you want to do that? What bug did you solve with those patches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so to convert a negative index into a positive index, we need the length of the subtree, this could work then
jv canonicalize_indices(jv paths, jv tree) {
int paths_length = jv_array_length(jv_copy(paths));
for (int i = 0; i < paths_length; i++) {
jv path = jv_array_get(jv_copy(paths), i);
jv subtree = tree;
int path_length = jv_array_length(jv_copy(path));
for (int j = 0; j < path_length; j++) {
jv step = jv_array_get(jv_copy(path), j);
if (jv_is_number(step)) {
int index = jv_number_value(step);
if (index < 0) {
int subtree_length = jv_array_length(jv_copy(subtree));
index = subtree_length + index;
if (index < 0) {
// Invalid index, handle error
jv_free(paths);
return jv_invalid_with_msg(jv_string("Invalid array index"));
}
path = jv_array_set(path, j, jv_number(index));
}
subtree = jv_array_get(jv_copy(subtree), index);
} else if (jv_is_string(step)) {
subtree = jv_object_get(jv_copy(subtree), step);
} else {
// Invalid path step, handle error
jv_free(paths);
return jv_invalid_with_msg(jv_string("Invalid path step"));
}
}
paths = jv_array_set(paths, i, path);
}
return paths;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With canonicalising a path, we mean converting:
["foo",{"start":10,"end":null},{"start":-200,"end":-2},20]
["foo",-20,"hello"]
To (if foo is an array with 120 elements)
["foo",30]
["foo",100,"hello"]
You need to know the input to do that; if foo were an array with 300 elements you would want this instead:
["foo",120]
["foo",280,"hello"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit messy. Because path elements could overlap because of the slice specifiers, we might have to explode {"start":...,"end":...}
into the individual indices that match those slice specifiers.
@capak07 the context here is that internally a "slice" specifier like $something[start:end]
is represented as an object with a "start"
and "end"
keys, and that the start and end indices can be negative to mean "starting from the end of this string or array". Here jv_sort()
just can't possibly do the right thing. We should either have a jv_sort_paths()
that does, or we should normalize the paths
to be given to jv_sort()
, and the latter seems like the better path forward for now because the former will likely require that we use qsort_r()
(which isn't portable, so we might have to import one from a BSD).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not reject object indices (they are slices), and del(.[100])
on a short array should not be an error, so you should not start throwing an invalid path error
Hmm, I am closing this PR since it seems the patch does not solve any issue, and otherwise it does not accomplish anything useful. |
This is a tricky bug to fix. |
Pull Request for #2868
jv_array_slice
function to support slicing of the array