-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds position
to AssignError
, ResolveError
variants, renames both errors to be less redundant
#93
base: main
Are you sure you want to change the base?
Conversation
* Adds type alias `crate::resolve::ResolveError` for `crate::resolve::Error` * Renames `crate::assign::AssignError` to `crate::assign::Error` * Adds type alias `crate::assign::AssignError` for crate::assign::Error` * Adds `position` (token index) to variants of `assign::Error` & `resolve::Error`
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
|
src/assign.rs
Outdated
"assignment failed due to an invalid index at offset {offset}" | ||
) | ||
Self::FailedToParseIndex { .. } => { | ||
write!(f, "assignment failed due to an invalid index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I'm reading this correctly we'll be deprecating the offset
field entirely later on? I'm not sure it makes sense to do this in stages, because even adding position
is already breaking, unfortunately. Might as well rip the band-aid off in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should deprecate it. It actually is incredibly useful to have and a positive consequence of my blunder to opt for offset
to begin with.
Folks can have pretty printed errors that do something like
/some/example/invalid/more
^^^^^^^^
without much additional effort to determine where that starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! In that case, why did you remove it from the message? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have mentioned that I went through the exact same thought process as you.
I started with "ugh, I guess I add position
, deprecate offset
, and plan on removing it." But that path has so many breaking changes across numerous releases.
Then it dawned on me that it's actually really useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - I'm back on this was a mistake. I'd like to remove offset
but I think we may be stuck with it. Or we potentially break a lot.
I just discovered mini-crater
, which may give some insight into what deleting offset
would likely mean, at least to known dependents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, have been busy, finally getting through my backlog.
No need to apologize about that at all.
because if you want to keep ownership you can always pass a reference. This will create a clone internally, but consider the alternative of taking a reference — then we'll always clone, even if the caller doesn't need the string anymore
So for Pointer::parse
, I don't think we should return the string. For one, they already have the string - we literally cannot take ownership of it. Two, a reference would introduce lifetimes which would make the errors a pain. The only upside is reporting which not everyone needs but introduces additional work for everyone else to rid themselves of lifetimes with a call to into_owned
.
PointerBuf::parse
is different. In the case of &str
, we have already allocated with .into()
. In the case of an owned String
, we have an existing allocation which we are currently dropping on the error path.
If we require the caller to keep a copy for their error path, we are causing an additional allocation on the happy path.
Right now, in #93 I have
pub struct ParseBufError {
pub value: String,
pub source: ParseError,
}
and ParseError
implements From<ParseBufError>
.
I've got to run for now - I'll reply to the other when I'm in front of my laptop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for Pointer::parse, I don't think we should return the string.
Agreed, although including an owned copy there would allow us to report with the #[source_code]
attribute without the annoying lifetime. Given this is the unhappy path I think it'd be reasonable to allocate for that.
If we require the caller to keep a copy for their error path, we are causing an additional allocation on the happy path.
Yep, hence the suggestion to make it use a Cow
. See my full suggestion in #96 (I think you meant to link that one?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! In that case, why did you remove it from the message? 😄
sorry, I missed this question and I can't recall :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh!! You meant the error message! Sorry, I should have held off on replying until I could devote enough energy to properly think through the question.
I removed the offset from the error message initially because I wasn't sure how much value it was adding vs making the error message more verbose and confusing. In retrospect, I'm not sure either are the case. Some may appreciate the offset in the message and I think the error message is now longer than what it was before.
I'll add them back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just to clarify it's not that I think we need the offset in the message, but just that the fact it's not used makes it seem like it's not useful (and if it isn't, perhaps it should be deprecated). But with miette in the picture I think it definitely is useful because then it can be used for the label.
lol, I redid most of this, forgetting I had started here. Woops. |
@asmello I'm sorry - I really meant to get to this point a couple of days ago. It isn't complete and I'm not sure what's going on with the reporting. If I leave off the
edit: nevermind, i just needed to run |
Now that I understand
I also have a few things I'm going to refactor a bit and make this a little less cumbersome. edit: nm, kept the previous naming (mostly), except I went with |
Alright, I think I have enough implemented for you to take a look when you have time @asmello. If we decide to go this route, I'll finish it out. Not really sure about the naming of miette uses "source" but i think that confuses the usage of |
It is fancy displaying now - I missed the part where you have to print So going this route, rather than using the derive macro, enables a few things:
Downsides that I can think of:
|
I'll try and have a look tomorrow.
Yeah, it's a bit confusing, because miette was originally designed for the kdl parser, so the terminology is oriented around source code, which isn't always appropriate. How about using
We probably don't need to be super exact with this name, maybe we can make do with |
CHANGELOG.md
Outdated
- Adds unsafe associated methods `Pointer::new_unchecked` and `PointerBuf::new_unchecked` for | ||
external zero-cost construction. | ||
- Adds `Pointer::starts_with` and `Pointer::ends_with` for prefix and suffix matching. | ||
- Adds new `ParseIndexError` variant to express the presence non-digit characters in the token. | ||
- Adds `Token::is_next` for checking if a token represents the `-` character. | ||
- Adds `ParseBufError`, returned as the `Err` side of `PointerBuf::parse`, which includes the input `String`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adds `ParseBufError`, returned as the `Err` side of `PointerBuf::parse`, which includes the input `String`. | |
- Adds `ParseBufError`, returned as the `Err` variant of `PointerBuf::parse`, which includes the input `String`. |
Also, I'm not sure about the String
part since the input can be another type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I wasn't sure how to express "the result of the call to Into::into<String>(&input)
" without it being so verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a proper review tomorrow, just had a peek
src/pointer.rs
Outdated
*/ | ||
#[derive(Debug, PartialEq)] | ||
pub struct ParseBufError { | ||
pub value: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're not going with Cow
here? I'm fine with it, but all the more reason to make it private so we have the option to change in the future without another hard break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do Cow
here, I guess we can consolidate them back down to ParseError
.
The thinking was that this is only created from PointerBuf::parse
, which has a String
, or from a method call (needs renaming) on ParseError
where you provide the source.
You're far more knowledgeable on rust and conventions than I - if you think ditching ParseBufErrror
(or whatever it ends up being called) in favor of throwing a Cow
on ParseError
's variants, we can go that route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big deal since this is the uncommon path, but my thinking is that with a Cow
we can avoid one allocation in PointerBuf::parse
even if we do fail validation. We need another (very slightly breaking) change to the API to take advantage of it:
pub fn parse(s: impl Into<Cow<'_, str>>) -> Result<Self, ParseBufError> {
let s = s.into();
validate(&s).map_err(|err| err.with_subject(s))?;
Ok(Self(s.into_owned()))
}
I am making some changes, |
Before you dive in to the code, what are your thoughts on the general approach? |
let mut v = serde_json::json!({"foo": {"bar": ["0"]}});
let ptr = PointerBuf::parse("/foo/bar/invalid/cannot/reach").unwrap();
let report = ptr.assign(&mut v, "qux").diagnose(ptr).unwrap_err();
println!("{:?}", miette::Report::from(report)); let mut v = serde_json::json!({"foo": {"bar": ["0"]}});
let ptr = PointerBuf::parse("/foo/bar/3/cannot/reach").unwrap();
let report = ptr.assign(&mut v, "qux").diagnose(ptr).unwrap_err();
println!("{:?}", miette::Report::from(report)); |
let invalid = "/foo/bar/invalid~3~encoding/cannot/reach";
let report = Pointer::parse(invalid).diagnose(invalid).unwrap_err();
println!("{:?}", miette::Report::from(report)); The PointerBuf::parse("/foo/bar/invalid~3~encoding/cannot/reach").diagnose(()); but Also, for parse related errors, we could go through and find all encoding errors and create labels for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not forget to remove "fancy"
feature from miette
oops, that was supposed to be attached to the line in |
Yea, I think it might be because it has a `source`? But the other errors do
as well so I’m not sure what’s causing it yet.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally sensible and straightforward, but my biggest concern is that I still don't understand the role of the custom Diagnostic
trait. Maybe there's a good reason for it but I'm just not seeing it yet.
FailedToParseIndex { | ||
/// Position (index) of the token which failed to parse as an [`Index`](crate::index::Index) | ||
position: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is to aid with the error message, right? Thinking about it, we could technically derive it on-the-fly from the offset + token. I know this was part of the raison d'etre for this PR, but I just realised I don't really understand the motivation.
} | ||
} | ||
|
||
pub fn offset(&self) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docs here and a few other places.
} | ||
|
||
fn url() -> &'static str { | ||
Self::url() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this works, but I think it'd be better to invoke the macro here and have it just assemble the string, rather than generate a separate impl block and define an eponymous method. I'm not sure there are any bad side-effects of doing it this way, but it's just unnecessarily convoluted IMO.
use crate::{Pointer, PointerBuf, Token}; | ||
|
||
/// Implemented by errors which can be converted into a [`Report`]. | ||
pub trait Diagnostic: Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the role of this trait. Can't we implement miette::Diagnostic
directly (with a feature gate)?
fn url() -> &'static str; | ||
|
||
/// Returns the label for the given [`Subject`] if applicable. | ||
fn labels(&self, subject: &Subject) -> Option<Box<dyn Iterator<Item = Label>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this Self::Subject
?
Also, I'm very confused about the usage here. I'd expect the error itself to normally contain the subject. And tooling can't find any instances where there's a call to this method, which makes me think it's not used?
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | ||
╔══════════════════════════════════════════════════════════════════════════════╗ | ||
║ ║ | ||
║ TopicalParseError ║ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs updating
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | ||
*/ | ||
#[derive(Debug, PartialEq)] | ||
pub struct RichParseError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I mentioned in another comment I believe we should just add subject to ParseError
instead as a Cow
, then we don't need all the code duplication.
@@ -1930,7 +2085,7 @@ mod tests { | |||
|
|||
#[quickcheck] | |||
fn qc_pop_and_push(mut ptr: PointerBuf) -> bool { | |||
let original_ptr = ptr.clone(); | |||
let subjectal_ptr = ptr.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be just subject_ptr
(I'm not sure 'subjectal' is a recognised word, but if it is it's pretty archaic).
╚══════════════════════════════════════════════════════════════════════════════╝ | ||
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | ||
*/ | ||
|
||
// TODO: should ResolveError be deprecated? | ||
/// Alias for [`Error`]. | ||
pub type ResolveError = Error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume so if you're deprecating AssignError
use serde_json::Value; | ||
|
||
impl Resolve for Value { | ||
type Value = Value; | ||
type Error = ResolveError; | ||
type Error = Error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar for Assign
, but I wonder if the Error
should just be Error
, rather than an associated type. I know this theoretically allows external implementers to bring their own error type, but thinking of the design of libraries like serde
and std::io
, seems common to be a bit prescriptive with the type and just give a escape hatch for custom errors.
Not 100% on this one, but seeing how all the internal implementations share the same type makes me suspect this is overly abstract.
&self.subject | ||
} | ||
|
||
pub fn source(&self) -> &ParseError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, not great this clashes with the Error
trait
I'm not sure either. I know that miette doesn't use the |
That documentation may be out of date though: https://docs.rs/miette/latest/src/miette/handlers/graphical.rs.html#311 |
Ah, I just misinterpreted the docs, they actually say they don't use the special miette-specific metadata when traversing the sources chain, but they do traverse it. I think we get duplicated messages here because:
As per my review comments, I think we should just collapse |
I could delegate the source. Basically replicate It's not ideal. But I don't know whether always allocating on the error path (for I wonder why the source for |
Nevermind, Before seriously considering flattening the parse errors down, I been considering a third variant, Then have parse methods which return various degrees of reporting. |
I don't think introducing lifetimes is a big deal for I think always allocating in the error path is probably acceptable too. As long as the errors are opaque we have the option to change this later without causing too hard of a break. But since parsing is such a common operation it feels worth avoiding allocation if the cost in complexity isn't high (and it doesn't seem like it is).
Hmmm I really don't like this direction. My personal take is this is a strictly worse option than either always allocating or having a borrowed error, in terms of API experience and maintainability. If we really wanted to have users choose their complexity of reporting, the right way to do that would be with feature flags, but I believe we have sufficient flexibility with the 'miette' flag already. Basic errors and miette-enhanced errors cover all use cases I can think of. Before commiting to anything I'd like to understand the reluctance with borrowed errors — maybe there is a very good reason to avoid them that I'm just looking past. Let's align on that first. But rather than introduce multiple options I'd really rather just go with always allocating. |
All |
Ack, let me try and play with it, I suspect this is just a matter of tweaking the internal API. |
pub const fn from_static(s: &'static str) -> &'static Self {
assert!(validate(s).is_ok(), "invalid json pointer");
unsafe { &*(core::ptr::from_ref::<str>(s) as *const Self) }
}
|
The lifetime issue with
as it is hanging here:
|
Yeah, ran into this one too. Quite interesting, I hadn't foreseen this problem, but I think I know a simple way around it. Hang tight, feel free to just let me play around for a while before spending further time on this. May well be my idea isn't workable (at least not without much added complexity), though I'm still optimistic. |
Random remark while I work on the changes I suggested, but |
Yikes, that's embarrassing. Thanks. |
I mean, took me forever to find it, too. 😅 Still fleshing out a few details, decided to try implementing all my suggestions to make sure they were viable. I think I finally understand where the borrowed errors breakdown: the That said it's also quite sad to lose the optimisation altogether simply because someone might be interested to call |
Ok, I think I'm mostly coming around to something close to your intended design with a wrapper type. We essentially have these options:
I've changed my mind too many times today, need to sleep on this one. |
Making some notes for later about the outer API errors we have:
|
I have an idea I'm working on; this is the structure so far (i'm sorry for lack of documentation): use crate::InvalidEncodingError;
/// The structure of a `ParseError`.
pub trait Structure {
type Cause: Causative;
type Subject: for<'a> From<&'a str> + From<String>;
}
/// [`Structure`] for a [`ParseError`] which contains the first encountered [`Cause`]
/// but does not contain the input string as `subject`.
pub struct WithoutInput;
impl Structure for WithoutInput {
type Cause = Cause;
type Subject = Empty;
}
/// [`Structure`] for a [`ParseError`] which contains the first encountered
/// [`Cause`] along with the input string as `subject`.
pub struct WithInput;
impl Structure for WithInput {
type Cause = Cause;
type Subject = String;
}
/// [`Structure`] for a [`ParseError`] which contains all encountered [`Cause`]s
/// along with the input string as `subject`.
pub struct Complete;
impl Structure for Complete {
type Cause = Vec<Cause>;
type Subject = String;
}
/// A cause of a `ParseError`.
pub trait Causative {
fn is_multi(&self) -> bool;
fn from_vec(list: Vec<Cause>) -> Self;
fn from_cause(cause: Cause) -> Self;
}
impl Causative for Vec<Cause> {
fn is_multi(&self) -> bool {
true
}
fn from_vec(list: Vec<Cause>) -> Self {
list
}
fn from_cause(cause: Cause) -> Self {
vec![cause]
}
}
/// Cause of a [`ParseError`].
#[derive(Debug, PartialEq, Eq)]
pub enum Cause {
/// `Pointer` did not start with a backslash (`'/'`).
NoLeadingBackslash,
/// `Pointer` contained invalid encoding (e.g. `~` not followed by `0` or
/// `1`).
InvalidEncoding {
/// Offset of the partial pointer starting with the token that contained
/// the invalid encoding
offset: usize,
/// The source `InvalidEncodingError`
source: InvalidEncodingError,
},
}
impl Causative for Cause {
fn is_multi(&self) -> bool {
false
}
fn from_vec(vec: Vec<Cause>) -> Self {
vec.into_iter().next().unwrap()
}
fn from_cause(cause: Cause) -> Self {
cause
}
}
/// An empty type used as a `subject` in a `ParseError` to indicate that the
/// error does not contain the subject.
pub struct Empty;
impl From<&str> for Empty {
fn from(_: &str) -> Self {
Empty
}
}
impl From<String> for Empty {
fn from(_: String) -> Self {
Empty
}
}
/// Indicates that a `Pointer` was malformed and unable to be parsed.
#[derive(Debug, PartialEq)]
pub struct ParseError<O: Structure = WithoutInput> {
cause: O::Cause,
subject: O::Subject,
}
impl<O> ParseError<O>
where
O: Structure<Cause = Cause>,
{
pub fn cause(&self) -> &O::Cause {
&self.cause
}
}
impl<O> ParseError<O>
where
O: Structure<Cause = Vec<Cause>>,
{
pub fn causes(&self) -> &[Cause] {
&self.cause
}
}
impl<O> ParseError<O>
where
O: Structure<Subject = String>,
{
pub fn subject(&self) -> &O::Subject {
&self.subject
}
} edit: updated code again |
I think we're converging into the same direction, see the changes I made in #98 |
Hah, that's awesome. I'm going to finish out the thought and then I'll look over yours. |
Ok, I fleshed out some details (particularly regarding converting borrowed to owned errors using the generic API). I'm pretty happy with how it turned out - the compiler handles type inference remarkably well. I also renamed types slightly, to be more consistent with In terms of usage, the gist of it is that once I also made the wrapped error implement These are the main changes I'm proposing to this PR. There's additional work I want to do with the errors but I think we should do it separately. |
BTW, great insight with the generic wrapper. Took me a long time yesterday to convince myself it was the right approach. Looking forward to an explanation on this new idea! |
I'm really glad you pushed back on it though. I was going back and forth over whether I glanced at the source earlier and I think this will actually meld with yours but I'm not 100% yet. |
Solves #90
assign::AssignError
toassign::Error
assign::AssignError
forassign::Error
resolve::ResolveError
toresolve::Error
resolve::ResolveError
forresolve::Error
position
(token index) to variants ofassign::Error
&resolve::Error
ParseBufError
, which contains the inputString
toPointerBuf::parse
to solvePointerBuf::parse
acceptsInto<String>
but does not include theString
in theErr
#96I'm not certain
position
is the right term to use here. Instinctively, I'd reach forindex
oridx
but that may lead to confusion overIndex
related errors, especially those specific to parsing anIndex
.This makes
AssignError
andResolveError
more idiomatic by reducing redundancy in renaming toError
. I'm not sure whether or not we should#[deprecate]
the type aliasesAssignError
andResolveError
.I regret making both errors' variants embedded structs rather than just rolling structs for each variant. I thought long and hard about splitting them out as structs but that's going to introduce far more breaking changes. While this is definitely breaking, I'm hoping that
..
or simply adding aposition
to{ }
on matches won't be that much of a hassle.