Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds position to AssignError, ResolveError variants, renames both errors to be less redundant #93

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

chanced
Copy link
Owner

@chanced chanced commented Oct 21, 2024

Solves #90

  • Renames assign::AssignError to assign::Error
  • Adds type alias assign::AssignError for assign::Error
  • Renames resolve::ResolveError to resolve::Error
  • Adds type alias resolve::ResolveError for resolve::Error
  • Adds position (token index) to variants of assign::Error & resolve::Error
  • Adds ParseBufError, which contains the input String to PointerBuf::parse to solve PointerBuf::parse accepts Into<String> but does not include the String in the Err #96

I'm not certain position is the right term to use here. Instinctively, I'd reach for index or idx but that may lead to confusion over Index related errors, especially those specific to parsing an Index.

This makes AssignError and ResolveError more idiomatic by reducing redundancy in renaming to Error. I'm not sure whether or not we should #[deprecate] the type aliases AssignError and ResolveError.

I regret making both errors' variants embedded structs rather than just rolling structs for each variant. I thought long and hard about splitting them out as structs but that's going to introduce far more breaking changes. While this is definitely breaking, I'm hoping that .. or simply adding a position to { } on matches won't be that much of a hassle.

* Adds type alias `crate::resolve::ResolveError` for `crate::resolve::Error`
* Renames `crate::assign::AssignError` to `crate::assign::Error`
* Adds type alias `crate::assign::AssignError` for crate::assign::Error`
* Adds `position` (token index) to variants of `assign::Error` & `resolve::Error`
@chanced chanced requested a review from asmello October 21, 2024 18:17
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 82.29167% with 34 lines in your changes missing coverage. Please review.

Project coverage is 97.1%. Comparing base (c694765) to head (45c4fde).

Files with missing lines Patch % Lines
src/resolve.rs 76.6% 32 Missing ⚠️
src/assign.rs 96.3% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/token.rs 100.0% <ø> (ø)
src/assign.rs 97.4% <96.3%> (-2.2%) ⬇️
src/resolve.rs 88.9% <76.6%> (-11.1%) ⬇️

... and 1 file with indirect coverage changes

src/assign.rs Outdated Show resolved Hide resolved
src/assign.rs Outdated Show resolved Hide resolved
src/assign.rs Outdated
"assignment failed due to an invalid index at offset {offset}"
)
Self::FailedToParseIndex { .. } => {
write!(f, "assignment failed due to an invalid index")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I'm reading this correctly we'll be deprecating the offset field entirely later on? I'm not sure it makes sense to do this in stages, because even adding position is already breaking, unfortunately. Might as well rip the band-aid off in one go.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we should deprecate it. It actually is incredibly useful to have and a positive consequence of my blunder to opt for offset to begin with.

Folks can have pretty printed errors that do something like

/some/example/invalid/more
             ^^^^^^^^

without much additional effort to determine where that starts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! In that case, why did you remove it from the message? 😄

Copy link
Owner Author

@chanced chanced Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned that I went through the exact same thought process as you.

I started with "ugh, I guess I add position, deprecate offset, and plan on removing it." But that path has so many breaking changes across numerous releases.

Then it dawned on me that it's actually really useful.

Copy link
Owner Author

@chanced chanced Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - I'm back on this was a mistake. I'd like to remove offset but I think we may be stuck with it. Or we potentially break a lot.

I just discovered mini-crater, which may give some insight into what deleting offset would likely mean, at least to known dependents.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, have been busy, finally getting through my backlog.

No need to apologize about that at all.

because if you want to keep ownership you can always pass a reference. This will create a clone internally, but consider the alternative of taking a reference — then we'll always clone, even if the caller doesn't need the string anymore

So for Pointer::parse, I don't think we should return the string. For one, they already have the string - we literally cannot take ownership of it. Two, a reference would introduce lifetimes which would make the errors a pain. The only upside is reporting which not everyone needs but introduces additional work for everyone else to rid themselves of lifetimes with a call to into_owned.

PointerBuf::parse is different. In the case of &str, we have already allocated with .into(). In the case of an owned String, we have an existing allocation which we are currently dropping on the error path.

If we require the caller to keep a copy for their error path, we are causing an additional allocation on the happy path.

Right now, in #93 I have

pub struct ParseBufError {
    pub value: String,
    pub source: ParseError,
}

and ParseError implements From<ParseBufError>.

I've got to run for now - I'll reply to the other when I'm in front of my laptop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for Pointer::parse, I don't think we should return the string.

Agreed, although including an owned copy there would allow us to report with the #[source_code] attribute without the annoying lifetime. Given this is the unhappy path I think it'd be reasonable to allocate for that.

If we require the caller to keep a copy for their error path, we are causing an additional allocation on the happy path.

Yep, hence the suggestion to make it use a Cow. See my full suggestion in #96 (I think you meant to link that one?).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! In that case, why did you remove it from the message? 😄

sorry, I missed this question and I can't recall :(

Copy link
Owner Author

@chanced chanced Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh!! You meant the error message! Sorry, I should have held off on replying until I could devote enough energy to properly think through the question.

I removed the offset from the error message initially because I wasn't sure how much value it was adding vs making the error message more verbose and confusing. In retrospect, I'm not sure either are the case. Some may appreciate the offset in the message and I think the error message is now longer than what it was before.

I'll add them back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just to clarify it's not that I think we need the offset in the message, but just that the fact it's not used makes it seem like it's not useful (and if it isn't, perhaps it should be deprecated). But with miette in the picture I think it definitely is useful because then it can be used for the label.

@chanced
Copy link
Owner Author

chanced commented Nov 7, 2024

lol, I redid most of this, forgetting I had started here. Woops.

@chanced
Copy link
Owner Author

chanced commented Nov 14, 2024

@asmello I'm sorry - I really meant to get to this point a couple of days ago.

It isn't complete and I'm not sure what's going on with the reporting. If I leave off the "fancy" feature, it spits out the error as expected. If I enable "fancy", I get this error:

error: failed to select a version for the requirement `supports-hyperlinks = "^3.0.0"`
candidate versions found which didn't match: 2.1.0, 2.0.0, 1.2.0, ...
location searched: crates.io index
required by package `miette v7.2.0`
    ... which satisfies dependency `miette = "^7.2.0"` (locked to 7.2.0) of package `jsonptr v0.6.3 (/Users/chance/dev/jsonptr)`

edit: nevermind, i just needed to run cargo update. It works!

@chanced
Copy link
Owner Author

chanced commented Nov 14, 2024

Now that I understand miette better, I need to rename things. I was under the impression Report was akin to Anyhow and not their wrapper. I figure

  • Report -> Diagnostic
  • IntoReport -> Diagnose

I also have a few things I'm going to refactor a bit and make this a little less cumbersome.

edit: nm, kept the previous naming (mostly), except I went with IntoReport -> Diagnostic

@chanced
Copy link
Owner Author

chanced commented Nov 14, 2024

Alright, I think I have enough implemented for you to take a look when you have time @asmello. If we decide to go this route, I'll finish it out.

Not really sure about the naming of TopicalParseError... I couldn't think of a way to express "ParseError but with the String".

miette uses "source" but i think that confuses the usage of std::error::Error::source so I went with Subject. SourcedParseError makes sense, albeit wordy, but SubjectedParseError is a bit tough to immediately infer the intent.

@chanced
Copy link
Owner Author

chanced commented Nov 14, 2024

It is fancy displaying now - I missed the part where you have to print :? (which is nice!). The alignment needs adjustment and specific error variants can potentially add more context than simply displaying the token (e.g., out of bounds could include the upper bounds).

So going this route, rather than using the derive macro, enables a few things:

  • label for tokens decoded value without needing to allocate ahead of time
  • lazy calculation of spans
  • aside from the addition of position to some enum variants and potentially some interaction with the Err side of PointerBuf::parse, it doesn't break existing code
  • keeps lifetimes out of errors
  • completely decouples reportable errors from miette

Downsides that I can think of:

  • potentially confusing api
  • larger surface area
  • more code that'll have to be documented and maintained

@asmello
Copy link
Collaborator

asmello commented Nov 14, 2024

I'll try and have a look tomorrow.

miette uses "source" but i think that confuses the usage of std::error::Error::source so I went with Subject. SourcedParseError makes sense, albeit wordy, but SubjectedParseError is a bit tough to immediately infer the intent.

Yeah, it's a bit confusing, because miette was originally designed for the kdl parser, so the terminology is oriented around source code, which isn't always appropriate. How about using origin instead?

Not really sure about the naming of TopicalParseError... I couldn't think of a way to express "ParseError but with the String".

We probably don't need to be super exact with this name, maybe we can make do with RichParseError, ExtendedParseError, CompleteParseError or similar?

src/assign.rs Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
- Adds unsafe associated methods `Pointer::new_unchecked` and `PointerBuf::new_unchecked` for
external zero-cost construction.
- Adds `Pointer::starts_with` and `Pointer::ends_with` for prefix and suffix matching.
- Adds new `ParseIndexError` variant to express the presence non-digit characters in the token.
- Adds `Token::is_next` for checking if a token represents the `-` character.
- Adds `ParseBufError`, returned as the `Err` side of `PointerBuf::parse`, which includes the input `String`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Adds `ParseBufError`, returned as the `Err` side of `PointerBuf::parse`, which includes the input `String`.
- Adds `ParseBufError`, returned as the `Err` variant of `PointerBuf::parse`, which includes the input `String`.

Also, I'm not sure about the String part since the input can be another type.

Copy link
Owner Author

@chanced chanced Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wasn't sure how to express "the result of the call to Into::into<String>(&input)" without it being so verbose.

Copy link
Collaborator

@asmello asmello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a proper review tomorrow, just had a peek

src/pointer.rs Outdated Show resolved Hide resolved
src/pointer.rs Outdated
*/
#[derive(Debug, PartialEq)]
pub struct ParseBufError {
pub value: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're not going with Cow here? I'm fine with it, but all the more reason to make it private so we have the option to change in the future without another hard break.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do Cow here, I guess we can consolidate them back down to ParseError.

The thinking was that this is only created from PointerBuf::parse, which has a String, or from a method call (needs renaming) on ParseError where you provide the source.

You're far more knowledgeable on rust and conventions than I - if you think ditching ParseBufErrror (or whatever it ends up being called) in favor of throwing a Cow on ParseError's variants, we can go that route.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal since this is the uncommon path, but my thinking is that with a Cow we can avoid one allocation in PointerBuf::parse even if we do fail validation. We need another (very slightly breaking) change to the API to take advantage of it:

pub fn parse(s: impl Into<Cow<'_, str>>) -> Result<Self, ParseBufError> {
    let s = s.into();
    validate(&s).map_err(|err| err.with_subject(s))?;
    Ok(Self(s.into_owned()))
}

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

I am making some changes, label -> labels to better align with miette and to allow for richer reporting.

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

Before you dive in to the code, what are your thoughts on the general approach?

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

let mut v = serde_json::json!({"foo": {"bar": ["0"]}});

let ptr = PointerBuf::parse("/foo/bar/invalid/cannot/reach").unwrap();
let report = ptr.assign(&mut v, "qux").diagnose(ptr).unwrap_err();
println!("{:?}", miette::Report::from(report));

produces:
Screenshot 2024-11-15 at 10 31 24 AM

let mut v = serde_json::json!({"foo": {"bar": ["0"]}});

let ptr = PointerBuf::parse("/foo/bar/3/cannot/reach").unwrap();
let report = ptr.assign(&mut v, "qux").diagnose(ptr).unwrap_err();
println!("{:?}", miette::Report::from(report));

produces:
Screenshot 2024-11-15 at 11 11 45 AM

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

let invalid = "/foo/bar/invalid~3~encoding/cannot/reach";
let report = Pointer::parse(invalid).diagnose(invalid).unwrap_err();
println!("{:?}", miette::Report::from(report));

prouces:
Screenshot 2024-11-15 at 11 01 54 AM

The PointerBuf equiv is currently:

PointerBuf::parse("/foo/bar/invalid~3~encoding/cannot/reach").diagnose(());

but RichParseError can implement miette::Diagnostic directly.

Also, for parse related errors, we could go through and find all encoding errors and create labels for them.

Copy link
Owner Author

@chanced chanced left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not forget to remove "fancy" feature from miette

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

oops, that was supposed to be attached to the line in Cargo.toml.

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024

let err = PointerBuf::parse("hello-world").unwrap_err();
println!("{:?}", miette::Report::from(err));

now produces:
Screenshot 2024-11-15 at 5 00 41 PM

@asmello
Copy link
Collaborator

asmello commented Nov 15, 2024

let err = PointerBuf::parse("hello-world").unwrap_err();
println!("{:?}", miette::Report::from(err));

now produces: Screenshot 2024-11-15 at 5 00 41 PM

Hmm, isn't that repeating the same message twice? Seems like a bug

@chanced
Copy link
Owner Author

chanced commented Nov 15, 2024 via email

Copy link
Collaborator

@asmello asmello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally sensible and straightforward, but my biggest concern is that I still don't understand the role of the custom Diagnostic trait. Maybe there's a good reason for it but I'm just not seeing it yet.

FailedToParseIndex {
/// Position (index) of the token which failed to parse as an [`Index`](crate::index::Index)
position: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is to aid with the error message, right? Thinking about it, we could technically derive it on-the-fly from the offset + token. I know this was part of the raison d'etre for this PR, but I just realised I don't really understand the motivation.

}
}

pub fn offset(&self) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docs here and a few other places.

}

fn url() -> &'static str {
Self::url()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this works, but I think it'd be better to invoke the macro here and have it just assemble the string, rather than generate a separate impl block and define an eponymous method. I'm not sure there are any bad side-effects of doing it this way, but it's just unnecessarily convoluted IMO.

use crate::{Pointer, PointerBuf, Token};

/// Implemented by errors which can be converted into a [`Report`].
pub trait Diagnostic: Sized {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the role of this trait. Can't we implement miette::Diagnostic directly (with a feature gate)?

fn url() -> &'static str;

/// Returns the label for the given [`Subject`] if applicable.
fn labels(&self, subject: &Subject) -> Option<Box<dyn Iterator<Item = Label>>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this Self::Subject?

Also, I'm very confused about the usage here. I'd expect the error itself to normally contain the subject. And tooling can't find any instances where there's a call to this method, which makes me think it's not used?

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
╔══════════════════════════════════════════════════════════════════════════════╗
║ ║
║ TopicalParseError ║
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs updating

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
*/
#[derive(Debug, PartialEq)]
pub struct RichParseError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I mentioned in another comment I believe we should just add subject to ParseError instead as a Cow, then we don't need all the code duplication.

@@ -1930,7 +2085,7 @@ mod tests {

#[quickcheck]
fn qc_pop_and_push(mut ptr: PointerBuf) -> bool {
let original_ptr = ptr.clone();
let subjectal_ptr = ptr.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be just subject_ptr (I'm not sure 'subjectal' is a recognised word, but if it is it's pretty archaic).

╚══════════════════════════════════════════════════════════════════════════════╝
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
*/

// TODO: should ResolveError be deprecated?
/// Alias for [`Error`].
pub type ResolveError = Error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume so if you're deprecating AssignError

use serde_json::Value;

impl Resolve for Value {
type Value = Value;
type Error = ResolveError;
type Error = Error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for Assign, but I wonder if the Error should just be Error, rather than an associated type. I know this theoretically allows external implementers to bring their own error type, but thinking of the design of libraries like serde and std::io, seems common to be a bit prescriptive with the type and just give a escape hatch for custom errors.

Not 100% on this one, but seeing how all the internal implementations share the same type makes me suspect this is overly abstract.

&self.subject
}

pub fn source(&self) -> &ParseError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not great this clashes with the Error trait

@asmello
Copy link
Collaborator

asmello commented Nov 15, 2024

Yea, I think it might be because it has a source? But the other errors do as well so I’m not sure what’s causing it yet.

I'm not sure either. I know that miette doesn't use the source method of the Error trait for anything currently (this is mentioned here), so there must be another factor at play.

@asmello
Copy link
Collaborator

asmello commented Nov 15, 2024

That documentation may be out of date though: https://docs.rs/miette/latest/src/miette/handlers/graphical.rs.html#311

@asmello
Copy link
Collaborator

asmello commented Nov 15, 2024

Ah, I just misinterpreted the docs, they actually say they don't use the special miette-specific metadata when traversing the sources chain, but they do traverse it. I think we get duplicated messages here because:

  1. Display for RichParseError delegates to self.source, which prints the message
  2. Then miette recurses into Error::source and invokes Display for ParseError directly, thus the message gets printed again

As per my review comments, I think we should just collapse RichParseError and ParseError into one, so this goes away.

@chanced
Copy link
Owner Author

chanced commented Nov 16, 2024

I could delegate the source. Basically replicate ParseError entirely, which honestly, is what should have been. A bit confusing but the source method would need to continue returning the ParseError. Discovered you can't match through Deref.

It's not ideal. But I don't know whether always allocating on the error path (for Pointer) or introducing lifetimes is optimal either. I think of the two, I'm kind of leaning toward always allocating. It'll be less churn and boilerplate for those who need static errors.

I wonder why the source for assign::Error isn't printing..

@chanced
Copy link
Owner Author

chanced commented Nov 16, 2024

Nevermind, RichParseError would need to be an enum of the same shape as ParseError but with the source.

Before seriously considering flattening the parse errors down, I been considering a third variant, Complete, which contained a list of ParseError and the string.

Then have parse methods which return various degrees of reporting.

@asmello
Copy link
Collaborator

asmello commented Nov 16, 2024

I don't think introducing lifetimes is a big deal for ParseError. We do want to have a way to make the error fully owned in case the caller wants it to outlive its origin (using 'origin' to refer to what miette calls 'source code'), but that's simple to do when we have an internal Cow. Intuitively it seems most likely the caller will just report the error directly as way of handling it, in which case the non-static lifetime isn't an issue. Unless I'm missing a different downside?

I think always allocating in the error path is probably acceptable too. As long as the errors are opaque we have the option to change this later without causing too hard of a break. But since parsing is such a common operation it feels worth avoiding allocation if the cost in complexity isn't high (and it doesn't seem like it is).

Before seriously considering flattening the parse errors down, I been considering a third variant, Complete, which contained a list of ParseError and the string.

Then have parse methods which return various degrees of reporting.

Hmmm I really don't like this direction. My personal take is this is a strictly worse option than either always allocating or having a borrowed error, in terms of API experience and maintainability. If we really wanted to have users choose their complexity of reporting, the right way to do that would be with feature flags, but I believe we have sufficient flexibility with the 'miette' flag already. Basic errors and miette-enhanced errors cover all use cases I can think of.

Before commiting to anything I'd like to understand the reluctance with borrowed errors — maybe there is a very good reason to avoid them that I'm just looking past. Let's align on that first. But rather than introduce multiple options I'd really rather just go with always allocating.

@chanced
Copy link
Owner Author

chanced commented Nov 17, 2024

I'm starting to move things over but I'm starting to hit turbulence with borrowck.

I'm concerned this may be common if we introduce the input's lifetime on the err:

    pub fn parse(s: impl Into<String>) -> Result<Self, ParseError<'static>> {
        let s = s.into();
        match validate(&s) {
            Ok(_) => Ok(Self(s)),
            Err(err) => Err(err.with_subject(source)),
        };
    }
error[E0505]: cannot move out of `s` because it is borrowed
   --> src/pointer.rs:918:46
    |
915 |         let s = s.into();
    |             - binding `s` declared here
916 |         match validate(&s) {
    |                        -- borrow of `s` occurs here
917 |             Ok(_) => Ok(Self(s)),
918 |             Err(err) => Err(err.with_subject(s)),
    |                                 ------------ ^ move out of `s` occurs here
    |                                 |
    |                                 borrow later used by call
    |
help: consider cloning the value if the performance cost is acceptable
    |
916 |         match validate(&s.clone()) {
    |                          ++++++++


Screenshot 2024-11-17 at 11 44 58 AM

@chanced
Copy link
Owner Author

chanced commented Nov 17, 2024

All FromStr and TryFrom impls will need to allocate since the errors do not have GAT lifetimes.

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

Ack, let me try and play with it, I suspect this is just a matter of tweaking the internal API.

@chanced
Copy link
Owner Author

chanced commented Nov 17, 2024

    pub const fn from_static(s: &'static str) -> &'static Self {
        assert!(validate(s).is_ok(), "invalid json pointer");
        unsafe { &*(core::ptr::from_ref::<str>(s) as *const Self) }
    }
error[E0493]: destructor of `Result<&str, ParseError<'_>>` cannot be evaluated at compile-time
   --> src/pointer.rs:116:17
    |
116 |         assert!(validate(s).is_ok(), "invalid json pointer");
    |                 ^^^^^^^^^^^                                - value is dropped here
    |                 |
    |                 the destructor for this type cannot be evaluated in constant functions

@chanced
Copy link
Owner Author

chanced commented Nov 17, 2024

The lifetime issue with parse is probably going to turn out to be a polonius. For some reason, I'm unable to check

RUSTFLAGS=-Zpolonius cargo +nightly check

as it is hanging here:

~/dev/jsonptr  add_position_to_errors ✔                                                                                                                                                                         4m  ⍉
▶ RUSTFLAGS=-Zpolonius cargo +nightly check
    Checking unicode-linebreak v0.1.5
    Building [========================>  ] 55/59: unicode-linebreak

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

    pub const fn from_static(s: &'static str) -> &'static Self {
        assert!(validate(s).is_ok(), "invalid json pointer");
        unsafe { &*(core::ptr::from_ref::<str>(s) as *const Self) }
    }
error[E0493]: destructor of `Result<&str, ParseError<'_>>` cannot be evaluated at compile-time
   --> src/pointer.rs:116:17
    |
116 |         assert!(validate(s).is_ok(), "invalid json pointer");
    |                 ^^^^^^^^^^^                                - value is dropped here
    |                 |
    |                 the destructor for this type cannot be evaluated in constant functions

Yeah, ran into this one too. Quite interesting, I hadn't foreseen this problem, but I think I know a simple way around it. Hang tight, feel free to just let me play around for a while before spending further time on this. May well be my idea isn't workable (at least not without much added complexity), though I'm still optimistic.

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

Random remark while I work on the changes I suggested, but ParseError::NoLeadingBackslash is wrong, / is a slash, not a backslash (\). I'll fix this but we should maybe push a separate PR for it before we finish this one.

@chanced
Copy link
Owner Author

chanced commented Nov 17, 2024

Yikes, that's embarrassing. Thanks.

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

Yikes, that's embarrassing. Thanks.

I mean, took me forever to find it, too. 😅

Still fleshing out a few details, decided to try implementing all my suggestions to make sure they were viable. I think I finally understand where the borrowed errors breakdown: the Error trait. It must return a trait object with a static lifetime, which we can't produce in-site, so the error type itself must have 'static lifetime. Technically we could implement the Error trait only for MyError<'static>, without losing the ability to represent other lifetimes, but that's likely mightily confusing for users.

That said it's also quite sad to lose the optimisation altogether simply because someone might be interested to call Error::source. The way we compose errors makes sources accessible directly in the variant values, so it's very tempting to just treat composite errors as a single error, but this has implications for how it gets rendered with miette too... I need to think about it a little more.

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

Ok, I think I'm mostly coming around to something close to your intended design with a wrapper type. We essentially have these options:

  1. Return an 'incomplete' error, which only contains an offset but not a reference to the input data. This is what we do in main currently. It's the most lightweight option, but the least informative.
  2. Return a 'complete' and owned error. This is the easiest to use, because your error contains all you need to display a nice message without any additional work needed. The downside is it always requires cloning the input.
  3. Return a 'complete' and borrowed error. As long as the error is used immediately, this is pretty much equivalent to case (2), but has the advantage of potentially avoiding cloning the input. If it must live longer, then it can be turned owned with a to_static method. The one big caveat is how it interacts with the Error trait - we are only able to implement that trait for the owned ('static) variant of the type, which can be pretty confusing. A lesser caveat is the error doesn't compose well, it must be 'static if the container wishes to implement Error too.
  4. Return an 'incomplete' error, but provide an API to complete it. This adds a step at the caller if the user wishes to have nice messages, but avoids any additional cost if they don't. This means we're also free to provide different wrappers that render errors in different ways without worrying about API conflicts. And perhaps most importantly, it gives the caller a direct choice of whether they want the error to be borrowed or owned. The main downside here is the risk of someone passing in a different input than the one that produced the error, which can lead to lots of brokenness. I think this risk is pretty small given the wrapping will typically happen immediately after the call to the fallible method, but it's there, unlike in other approaches.

I've changed my mind too many times today, need to sleep on this one.

@asmello
Copy link
Collaborator

asmello commented Nov 17, 2024

Making some notes for later about the outer API errors we have:

  • ParseIndexError is an enum error with a dataless variant, a foreign error variant, and an internal error variant. Of these, two have a meaningful offset and would benefit from enriched display.
  • {assign,resolve}::Error is an enum error. As one of the variants is ParseIndexError, which already may reference the subject pointer, I think this one shouldn't hold a reference itself, but instead always delegate to its variants to do so.
  • ParseError (I'm unifying ParseBufError here). This one is interesting, because in some cases we may already receive an owned value, which we need to preserve in the Err variant (or lose it!), so an incomplete error doesn't really make sense.

@chanced
Copy link
Owner Author

chanced commented Nov 18, 2024

I have an idea I'm working on; this is the structure so far (i'm sorry for lack of documentation):

use crate::InvalidEncodingError;

/// The structure of a `ParseError`.
pub trait Structure {
    type Cause: Causative;
    type Subject: for<'a> From<&'a str> + From<String>;
}

/// [`Structure`] for a [`ParseError`] which contains the first encountered [`Cause`]
/// but does not contain the input string as `subject`.
pub struct WithoutInput;
impl Structure for WithoutInput {
    type Cause = Cause;
    type Subject = Empty;
}

/// [`Structure`] for a [`ParseError`] which contains the first encountered
/// [`Cause`] along with the input string as `subject`.
pub struct WithInput;
impl Structure for WithInput {
    type Cause = Cause;
    type Subject = String;
}

/// [`Structure`] for a [`ParseError`] which contains all encountered [`Cause`]s
/// along with the input string as `subject`.
pub struct Complete;
impl Structure for Complete {
    type Cause = Vec<Cause>;
    type Subject = String;
}

/// A cause of a `ParseError`.
pub trait Causative {
    fn is_multi(&self) -> bool;
    fn from_vec(list: Vec<Cause>) -> Self;
    fn from_cause(cause: Cause) -> Self;
}

impl Causative for Vec<Cause> {
    fn is_multi(&self) -> bool {
        true
    }

    fn from_vec(list: Vec<Cause>) -> Self {
        list
    }

    fn from_cause(cause: Cause) -> Self {
        vec![cause]
    }
}

/// Cause of a [`ParseError`].
#[derive(Debug, PartialEq, Eq)]
pub enum Cause {
    /// `Pointer` did not start with a backslash (`'/'`).
    NoLeadingBackslash,

    /// `Pointer` contained invalid encoding (e.g. `~` not followed by `0` or
    /// `1`).
    InvalidEncoding {
        /// Offset of the partial pointer starting with the token that contained
        /// the invalid encoding
        offset: usize,
        /// The source `InvalidEncodingError`
        source: InvalidEncodingError,
    },
}

impl Causative for Cause {
    fn is_multi(&self) -> bool {
        false
    }

    fn from_vec(vec: Vec<Cause>) -> Self {
        vec.into_iter().next().unwrap()
    }

    fn from_cause(cause: Cause) -> Self {
        cause
    }
}

/// An empty type used as a `subject` in a `ParseError` to indicate that the
/// error does not contain the subject.
pub struct Empty;
impl From<&str> for Empty {
    fn from(_: &str) -> Self {
        Empty
    }
}
impl From<String> for Empty {
    fn from(_: String) -> Self {
        Empty
    }
}
/// Indicates that a `Pointer` was malformed and unable to be parsed.
#[derive(Debug, PartialEq)]
pub struct ParseError<O: Structure = WithoutInput> {
    cause: O::Cause,
    subject: O::Subject,
}

impl<O> ParseError<O>
where
    O: Structure<Cause = Cause>,
{
    pub fn cause(&self) -> &O::Cause {
        &self.cause
    }
}
impl<O> ParseError<O>
where
    O: Structure<Cause = Vec<Cause>>,
{
    pub fn causes(&self) -> &[Cause] {
        &self.cause
    }
}
impl<O> ParseError<O>
where
    O: Structure<Subject = String>,
{
    pub fn subject(&self) -> &O::Subject {
        &self.subject
    }
}

edit:
Complete was supposed to have a Vec<Cause> as Cause. Updated it.

updated code again

@asmello
Copy link
Collaborator

asmello commented Nov 18, 2024

I think we're converging into the same direction, see the changes I made in #98

@chanced
Copy link
Owner Author

chanced commented Nov 18, 2024

Hah, that's awesome. I'm going to finish out the thought and then I'll look over yours.

@asmello
Copy link
Collaborator

asmello commented Nov 18, 2024

Ok, I fleshed out some details (particularly regarding converting borrowed to owned errors using the generic API). I'm pretty happy with how it turned out - the compiler handles type inference remarkably well.

I also renamed types slightly, to be more consistent with miette, although I don't like the name reuse. The trait errors should implement is Diagnostic, and the wrapped error type is Report. This aligns with what miette calls its trait and concrete type (and is also consistent with eyre and anyhow).

In terms of usage, the gist of it is that once Diagnostic is implemented for an error, users can call .into_report(subject) to turn the error into the enriched type, using either a reference type or an owned type. If they pass a reference, they get a borrowed type, if they pass an owned type, they get an owned type. Then if they have a borrowed error, they can call .into_owned to get its owned version. Super simple. I moved away from .to_static just because we can just take ownership and so there's no clash with the ToOwned trait.

I also made the wrapped error implement Deref<Target=SRC>.

These are the main changes I'm proposing to this PR. There's additional work I want to do with the errors but I think we should do it separately.

@asmello
Copy link
Collaborator

asmello commented Nov 18, 2024

BTW, great insight with the generic wrapper. Took me a long time yesterday to convince myself it was the right approach. Looking forward to an explanation on this new idea!

@chanced
Copy link
Owner Author

chanced commented Nov 18, 2024

I'm really glad you pushed back on it though. I was going back and forth over whether Cow on ParseError was the right approach rather than relying on the wrapper. I completely agree with you about having 3 variations of ParseError being worst case scenario though and that was the direction I was headed down.

I glanced at the source earlier and I think this will actually meld with yours but I'm not 100% yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants