Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update delpaths function to use sort with slicing #3040

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

capak07
Copy link

@capak07 capak07 commented Feb 12, 2024

Pull Request for #2868

  • Added jv_array_slice function to support slicing of the array

src/jv_aux.c Outdated Show resolved Hide resolved
@wader
Copy link
Member

wader commented Feb 19, 2024

Can someone give an example when this happens? also some test for it would be good

@emanuele6
Copy link
Member

@wader Well, I checked my 2023-08-29 chat logs with @nicowilliams and that issue was about the following jq script (that is supposed to delete duplicate elements in the input array without sorting):

[6,5,1,2,5,3,4,6,7,6,5] |
del(
  . as $out |
  range(length) as $i
  .[$i+1:][] |
  select(. == $dot[$i])
)

Not working as intended because of its use of slices; output:

$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | del(. as $dot | range(length) as $i | .[$i+1:][] | select(. == $dot[$i]))'
[6,5,1,2,3,4,7,5]
$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | delpaths([path(. as $dot | range(length) as $i | .[$i+1:][] | select(. == $dot[$i])) | debug])'
["DEBUG:",[{"start":1,"end":null},6]]
["DEBUG:",[{"start":1,"end":null},8]]
["DEBUG:",[{"start":2,"end":null},2]]
["DEBUG:",[{"start":2,"end":null},8]]
["DEBUG:",[{"start":5,"end":null},5]]
["DEBUG:",[{"start":8,"end":null},1]]
[6,5,1,2,3,4,7,5]

(The last duplicate 5 did not get deleted)

This is the usual problem that afflicts path expressions with negative indices and slices; the slice paths do not get canonicalised (or they are not sorted in an order that would make the deletions work correctly), so they end up pointing the wrong elements.

A version of that same script that uses a range to loop the array instead of a slice works correctly:

[6,5,1,2,5,3,4,6,7,6,5] |
del(
  . as $dot |
  range(length) as $i |
  .[range($i + 1; length)] |
  select(. == $dot[$i])
)

output:

$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | del(. as $dot | range(length) as $i | .[range($i + 1; length)] | select(. == $dot[$i]))'
[6,5,1,2,3,4,7]
$ jq -cn '[6,5,1,2,5,3,4,6,7,6,5] | delpaths([ path(. as $dot | range(length) as $i | .[range($i + 1; length)] | select(. == $dot[$i])) | debug ])'
["DEBUG:",[7]]
["DEBUG:",[9]]
["DEBUG:",[4]]
["DEBUG:",[10]]
["DEBUG:",[10]]
["DEBUG:",[9]]
[6,5,1,2,3,4,7]

I don't really understand this patch, so I am not sure what @capak07 had in mind with it. This does not solve the issue #2868 was created for, but maybe they are trying to solve a different bug?

@emanuele6
Copy link
Member

I don't understand what you mean

paths = jv_sort(paths, jv_copy(paths));
jv_array_foreach(paths, i, elem) {
int paths_length = jv_array_length(jv_copy(paths));
paths = jv_sort(jv_array_slice(paths, 0, paths_length), jv_copy(paths));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be the fix, right? I mean, the problem is that we need to canonicalize array indices in paths first, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jv_array_slice(a, 0, length_of_a) should be a no-op, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this patch is trying to do either

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicowilliams
is this what you're suggesting?

jv canonicalize_indices(jv paths) {
  int paths_length = jv_array_length(jv_copy(paths));
  for (int i = 0; i < paths_length; i++) {
    jv path = jv_array_get(jv_copy(paths), i);
    int index = jv_number_value(path);
    if (index < 0) {
      index = paths_length + index;
    }
    if (index < 0 || index >= paths_length) {
      jv_free(paths);
      return jv_invalid_with_msg(jv_string("Invalid array index"));
    }
    paths = jv_array_set(paths, i, jv_number(index));
  }
  return paths;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To canonicalize the paths we need to refer to the tree they refer to. The idea is to take any negative indices in the paths and normalize them to positive indices, but to do this you need to know the length of the corresponding array in the tree for that negative index in the path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (index < 0) index = paths_length + index;? Why would you want to do that? What bug did you solve with those patches?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so to convert a negative index into a positive index, we need the length of the subtree, this could work then

jv canonicalize_indices(jv paths, jv tree) {
  int paths_length = jv_array_length(jv_copy(paths));
  for (int i = 0; i < paths_length; i++) {
    jv path = jv_array_get(jv_copy(paths), i);
    jv subtree = tree;
    int path_length = jv_array_length(jv_copy(path));
    for (int j = 0; j < path_length; j++) {
      jv step = jv_array_get(jv_copy(path), j);
      if (jv_is_number(step)) {
        int index = jv_number_value(step);
        if (index < 0) {
          int subtree_length = jv_array_length(jv_copy(subtree));
          index = subtree_length + index;
          if (index < 0) {
            // Invalid index, handle error
            jv_free(paths);
            return jv_invalid_with_msg(jv_string("Invalid array index"));
          }
          path = jv_array_set(path, j, jv_number(index));
        }
        subtree = jv_array_get(jv_copy(subtree), index);
      } else if (jv_is_string(step)) {
        subtree = jv_object_get(jv_copy(subtree), step);
      } else {
        // Invalid path step, handle error
        jv_free(paths);
        return jv_invalid_with_msg(jv_string("Invalid path step"));
      }
    }
    paths = jv_array_set(paths, i, path);
  }
  return paths;
}

Copy link
Member

@emanuele6 emanuele6 Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With canonicalising a path, we mean converting:

["foo",{"start":10,"end":null},{"start":-200,"end":-2},20]
["foo",-20,"hello"]

To (if foo is an array with 120 elements)

["foo",30]
["foo",100,"hello"]

You need to know the input to do that; if foo were an array with 300 elements you would want this instead:

["foo",120]
["foo",280,"hello"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit messy. Because path elements could overlap because of the slice specifiers, we might have to explode {"start":...,"end":...} into the individual indices that match those slice specifiers.

@capak07 the context here is that internally a "slice" specifier like $something[start:end] is represented as an object with a "start" and "end" keys, and that the start and end indices can be negative to mean "starting from the end of this string or array". Here jv_sort() just can't possibly do the right thing. We should either have a jv_sort_paths() that does, or we should normalize the paths to be given to jv_sort(), and the latter seems like the better path forward for now because the former will likely require that we use qsort_r() (which isn't portable, so we might have to import one from a BSD).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not reject object indices (they are slices), and del(.[100]) on a short array should not be an error, so you should not start throwing an invalid path error

@emanuele6
Copy link
Member

Hmm, I am closing this PR since it seems the patch does not solve any issue, and otherwise it does not accomplish anything useful.

@emanuele6 emanuele6 closed this Feb 29, 2024
@nicowilliams
Copy link
Contributor

This is a tricky bug to fix.

@nicowilliams nicowilliams reopened this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants