-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for splitting a Pointer
by Token
position (index)
#42
Comments
Alternatively, we just continue to use the offsets. |
Dealing with I'm still not sure what the right answer is here. Some of the API treats the If we double down on the |
It wasn't horrible to write the iterator below, but I think it could be made a lot cleaner. If enum Component<'a> {
Root,
Component {
/// full path to this component
path: &'a Pointer,
offset: usize,
token: Token<'a>,
token_pointer: &'a Pointer,
}
} Here's the iterator: #[derive(Debug)]
pub struct WalkTo<'p, 'v> {
cursor: &'v Value,
original_value: &'v Value,
full_path: &'p Pointer,
offset: Option<NonZeroUsize>,
}
impl<'p, 'v> WalkTo<'p, 'v> {
pub fn new(value: &'v Value, to: &'p Pointer) -> Self {
Self {
cursor: value,
original_value: value,
full_path: to,
offset: None,
}
}
fn handle_err(
&self,
_err: ResolveError,
path: &'p Pointer,
_resolvable: &'p Pointer,
) -> Option<Result<(&'p Pointer, &'v Value), ResolveError>> {
// TODO: determine appropriate offsets and update error rather than re-resolving
// for now, we are just punting to `Resolve` to recreate the error
Some(Err(self.original_value.resolve(path).unwrap_err()))
}
}
impl<'p, 'v> Iterator for WalkTo<'p, 'v> {
type Item = Result<(&'p Pointer, &'v Value), ResolveError>;
fn next(&mut self) -> Option<Self::Item> {
// we need to get the offset to determine where we are in the pointer
let offset = if let Some(offset) = self.offset {
// the offset has previously been set, so we use that
offset.get()
} else {
// An empty offset means we are at the beginning of the walk.
// this is a special case where the target path is root
// we need to handle this separately because we later account
// for tokens, which an empty path has none of.
if self.full_path.is_root() {
// the target path is root, so we set the offset to 1 ensures we
// do not send repeats due to the bounds check on offset below.
self.offset = NonZeroUsize::new(1);
return Some(Ok((Pointer::root(), self.cursor)));
}
// if the offset was not previously set, we start at 0
0
};
// checking to make sure we are not out of bounds
if offset > self.full_path.len() {
return None;
}
// split the path at the offset, where everything to the left
// is the full path of the current value to be sent and everything
// to the right is the remaining path to be resolved.
let (path, remaining) = self
.full_path
.split_at(offset)
.unwrap_or((self.full_path, Pointer::root()));
if let Some(next) = remaining.first() {
// if there is a next token, we set the offset to the next token's length
// plus 1 to account for the slash.
self.offset = NonZeroUsize::new(offset + next.encoded().len() + 1);
} else {
// otherwise we intentionally push the offset out of bounds
self.offset = NonZeroUsize::new(offset + 1)
}
// we want the last token as a `&Pointer` so that we can use the resolve logic
// the path is either splittable (contains a token) or is empty (root).
//
// If it is splittable, we either use the token's length to determine the token's
// offset and split the path there or we use the root pointer.
let resolvable = path
.last()
.map(|t| path.split_at(path.len() - t.encoded().len() - 1).unwrap().1)
.unwrap_or(Pointer::root());
// we attempt to resolve the value
let result = self.cursor.resolve(resolvable);
let value = match result {
Ok(ok) => ok,
Err(err) => return self.handle_err(err, path, resolvable),
};
// moving the cursor value to the resolved value
self.cursor = value;
Some(Ok((path, value)))
}
} https://github.com/chanced/grill/blob/main/grill-core/src/source/walk/to.rs |
I think a Cursor API, like the experimental one in BTree, would be awesome. I sort of figured indices weren't the right abstraction for this. And I've had to deal with simultaneous traversal of a Value and Pointer too. Cursor feels like the right abstraction for this. 👍 |
Ah, that is interesting. I can't quite get my head around how it'd feel interacting with it though. It may be too complex for some use cases while incredibly useful for others. The ghost elements are definitely unusual though. |
One thing that would have cut down the complexity of the iterator a bit is a let resolvable = path
.last()
.map(|t| path.split_at(path.len() - t.encoded().len() - 1).unwrap().1)
.unwrap_or(Pointer::root()); It's a simple addition but would be a breaking change. |
I think we'd need some sort of constructed pointer type if the idea is to use the cursor to traverse a value. |
I guess the edit: nah, this would be a bad idea. |
Had a closer look at your example above, help me understand what you're trying to achieve there. Seems to me that a big chunk of complexity stems from the fact you wish to produce a partial pointer with every iteration, together with the value you're pointing at. That's a bit niche/unusual, and I suspect that's a big part of why it ends up being complex. Do you really need that partial pointer? If not you could get by with this: #[derive(Debug)]
pub struct WalkTo<'p, 'v> {
value: &'v Value,
tokens: Option<Tokens<'p>>,
}
impl<'p, 'v> WalkTo<'p, 'v> {
pub fn new(value: &'v Value, to: &'p Pointer) -> Self {
Self {
value,
tokens: Some(to.tokens()),
}
}
}
// placeholder
#[derive(Debug)]
pub struct ResolveError;
impl<'p, 'v> Iterator for WalkTo<'p, 'v> {
type Item = Result<&'v Value, ResolveError>;
fn next(&mut self) -> Option<Self::Item> {
let tokens = self.tokens.as_mut()?;
let curr_value = self.value;
if let Some(next) = tokens.next() {
match self.value {
Value::Null | Value::Bool(_) | Value::Number(_) | Value::String(_) => {
// cannot descend further
self.tokens.take();
return Some(Err(ResolveError));
}
Value::Array(vec) => {
let idx = match next.to_index() {
Ok(idx) => match idx.for_len(vec.len()) {
Ok(idx) => idx,
Err(_) => return Some(Err(ResolveError)),
},
Err(_) => return Some(Err(ResolveError)),
};
self.value = &vec[idx];
}
Value::Object(map) => {
self.value = match map.get(next.decoded().as_ref()) {
Some(val) => val,
None => return Some(Err(ResolveError)),
};
}
}
} else {
// we've reached the target node
self.tokens.take();
}
Some(Ok(curr_value))
}
} If you do need the partial pointers, I think we could just add a slicing API so you could do I'm inclined to make |
The niche comment likely answers a question I was going to present, which is whether providing a walk impl like the above would make any sense. I actually have it partially implemented, with WalkTo being done for json. I was going to submit a pull request when I had I was able to clean it up, or at least that line, by using My situation may be niche though, indeed. I need the full path and value of eqch node along a path. |
@asmello I was on mobile earlier and hadn't read all of your reply - I just wanted to push up the source I had mentioned before responding to the rest. Believe it or not, I was contemplating ranges as well. My initial inclination was to change the signature of this: impl Pointer {
pub fn get(&self, index: usize) -> Option<Token> {}
} to impl Pointer {
// this is incorrect - there should be generics but i have no idea what they'd be
pub fn get<I: std::slice::SliceIndex>(&self, index: I) -> Option<I::Output> {}
} and leave the |
I've been thinking about this more, and I don't really get the use case for In general I'm well aligned on the usefulness of traversing a Separately, I think we can just go ahead and add a slicing API, I think that's actually useful on its own. Will do some experimentation with it. |
Ah, sorry! Yea, it's a mess over in that repo at the moment. I'm in the middle of rewriting it to use the iterator but I have to swap things over there. This is going to be probably more explanation than you're hoping for but I'll try and keep it short. With json schema 2019+(refs existed in 07 but didn't do anything), you can reference a foreign schema by uri. If that URI contains a json pointer, you have to resolve that location and compile only it. However, ancestors of the schema can set the dialect ( What you see there is me trying to compile up from the target. Basically I was going to try and check one level up, see if it contained this schema as a sub schema, if it did, index it (not compile) and then search up for the schema which had it as a sub schema. Ultimately, I decided that just wasn't worth it. For one, each of those checks requires re-resolving the schema being scanned. Second, the logic was going to be messy and error prone. So I'm going to go back to my initial design (sort of, cleaned up, hopefully), and just scan from the root all the way up to the target. I suspect this may change in the future but it's the easiest and safest way for me to finish. Basically I'll scan the root, check if it contains a sub schema that has a partial pointer that matches my target, and then skip to that sub schema and repeat. If the schema I'm scanning doesn't contain a subschema with a partial path of my schema, I move to the next and check it until I find one that does or until I hit my target. In either case, I need the full path as I take it and create a URI with the base uri I keep all of the root documents as This is at least my second time writing the lib - the previous, most complete, version was a hot mess of oop mentality, slammed full of editI should mention that the scenario described above explains In the case where an external schema is referenced by an unknown anchor(eg For unknown anchors, I need to start from the root document and scan it and all descendant schemas. As part of the scan, each schema has one or more links created for it, based on the path and any anchors. After scanning the whole doc, I check The reason it isn't simply walk is because in json schema, any schema which has an * confusingly enough, I also have edits above the edit header. Terribly sorry if you read it as an email only to discover I have edited it without clear distinction. I need to get better at proof reading my replies, especially if I'm on mobile. |
The more I think about this, the more convinced I am that the walk iterators don't belong in this crate, at least not in their current form. In fact, I may not end up using I'm certain that a range API is the way to go. It'll allow consumers to craft their own bespoke traversal strategies a great deal more efficiently (at least in terms of dev expenditures). However, if we are going to double down on it being a collection type, which I think is the right path, we should probably revisit |
It's a good point. I don't think the choice we've made was unreasonable. We can introduce a |
Also, perhaps one point in favour of That said, I suppose we could make it constant time if we kept a token count separately (as the original implementation did), at a small cost in bookkeeping. Will give this some thought, doesn't seem obviously better, but it's worth considering. |
Yea, this a tough one. I'm not sure what the right call is. Inversely, Lets stew on |
Here's a radical idea: let's get rid of
We can introduce Then if one wishes to count tokens, they can use We can deprecate Let's sit on this for a while, though. 👍 |
I'm game for removing it. I think that's the answer. |
Even if we end up adding it back, I think it can be added after it goes through the deprecation cycle. By that point, it'll have been at least 2 minors. Any problems that arise will be due to jumping several minor releases. |
Added a note in #84 about the complexity of token-based indexing. |
We need a way to slice / split a token by range, using the index.
The text was updated successfully, but these errors were encountered: