Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental parse error from #93 #99

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

chanced
Copy link
Owner

@chanced chanced commented Nov 20, 2024

@asmello experimental parse error from #93 Sorry this took me so long to get back around to doing.

Its not as ergonomic as I had hoped, I forgot you can't provide defaults to generics of functions. Honestly, it may honestly be more hassle than its worth. I don't know if multi-error reporting is even remotely useful to folks for something as simple as a json pointer.

The miette output of the multi-error is a mess at the moment. I'm guessing its due to the related errors but I dont know how to get a list to show otherwise.

let err: ParseError<Complete> = PointerBuf::parse("hello-world/~3/##~~/~3/~") // or .parse_complete
    .unwrap_err() // ParseError<WithInput>
    .into(); // ParseError<Complete>

println!("{:?}", miette::Report::from(err));
Screenshot 2024-11-20 at 10 48 42 AM

chanced and others added 21 commits October 21, 2024 14:09
* Adds type alias `crate::resolve::ResolveError` for `crate::resolve::Error`
* Renames `crate::assign::AssignError` to `crate::assign::Error`
* Adds type alias `crate::assign::AssignError` for crate::assign::Error`
* Adds `position` (token index) to variants of `assign::Error` & `resolve::Error`
* improves ParseIndexError::InvalidCharacter
Co-authored-by: André Mello <[email protected]>
@asmello
Copy link
Collaborator

asmello commented Nov 20, 2024

I don't think related errors is a bad idea intrinsically — one of the things I wanted to explore in follow up PRs is this. This seems to be the right approach with miette for displaying multiple errors. Most of the noise in the output is due to that bug I flagged on #93, which we can just fix, and maybe we can tune the labels a bit for the multi-errors case, but it doesn't look terrible.

That said, I think we should leave it for a later step, after we've figured out the basic interface for rich errors.

Assuming we do this, my take is we shouldn't store the full offsets in the original error, and instead should compute them on-demand when the rich error is constructed (or even when .related_errors() is called!). It'd look like this from a usage perspective:

let err: Report<ParseError, Cow<'_, str>> = PointerBuf::parse("hello-world/~3/##~~/~3/~")
    .unwrap_err();
let err: FullReport<ParseError, Cow<'_, str>> = err.diagnose_fully();

// needed for conversion to `miette::Report`
let err: FullReport<ParseError, Cow<'static, str>> = err.into_owned();

println!("{:?}", miette::Report::from(err));

The need to convert to miette::Report to get the nice output is sad, I didn't think of that. And I guess that means we always end up needing an owned error, so the nice lifetime flexibility I came up with is moot... 😞 Some related discussion here: zkat/miette#305.

If we let go of the Cow, that doesn't change much, though:

let err: Report<ParseError, String> = PointerBuf::parse("hello-world/~3/##~~/~3/~")
    .unwrap_err();
let err: FullReport<ParseError, String> = err.diagnose_fully();

println!("{:?}", miette::Report::from(err));

The subject (SUB) generic I added to Report is still useful so that we can handle String and PointerBuf subjects without the complexity of the Subject enum you added on #93. I'll make adjustments to #98 later.

I'll give some more thought into the stuff in parse_error.rs, but my gut feeling is this is backwards. One generic wrapper that can extend many concrete types makes a lot of sense, it's applying composability correctly, but I don't immediately see the point of using generics with a marker trait here. That usually goes for making types stateful, so that the internal data representation can be the same but you get a different set of methods available for each concrete marker choice. So it can make sense for a builder, for example, and bon takes advantage of that to guarantee statically that all mandatory fields are set before you can build. Here it seems like we'd be inverting that paradigm, and defining the internal data based on the marker trait, at which point it seems like it's just implementing two distinct ParseError types with extra steps.

I don't have a complete alternative to offer yet, other than doing the straightforward thing and defining ParseError and ParserErrors as two separate types, but I suspect we can do something smarter with the Report. Simply adding a field of Vec<dyn miette::Diagnostic> doesn't work because miette::Diagnostic is not object-safe, and Vec<E> is probably not feasible either (although we might be able to assume E = SRC, that is, all errors are the same type as the originator), but I'll give it some thought.

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

n@asmello I'm sorry, github is back to marking messages I haven't seen as read. I'm not sure what that's about.

Most of the noise in the output is due to that bug I flagged on #93, which we can just fix, and maybe we can tune the labels a bit for the multi-errors case, but it doesn't look terrible.

Yep, this was due to me delegating again. I created an enum InvalidEncoding and attached it to EncodingError (renamed from InvalidEncodingError).

Single error display:
Screenshot 2024-11-21 at 11 02 00 AM

multi-error display (slightly different from above as I omitted the leading slash:
Screenshot 2024-11-21 at 11 02 37 AM

There's still a bug that trailing ~ are not displayed. The problem is the span length, i believe.

That said, I think we should leave it for a later step, after we've figured out the basic interface for rich errors.

Heh, I should have kept reading before investigating and writing code.

Assuming we do this, my take is we shouldn't store the full offsets in the original error, and instead should compute them on-demand when the rich error is constructed (or even when .related_errors() is called!). It'd look like this from a usage perspective:

That random function attached to Pointer that didn't seem connected to the previous changeset was for this purpose. I started with computing the offsets from the positions, under the expectation that we would be removing the offset field eventually.

I ended up just using the offsets already on the errors and it made things a good bit cleaner. I'm not opposed to removing them from assign::Error and resolve::Error. The offsets are easy enough to figure out from the position (so long as you have the source, obviously).

When it comes to ParseError, I think it has gotten better now that the errors are being sourced properly. The verbiage of the errors themselves could use improvement and perhaps using complete_offset() instead of giving the offset of the token makes more sense.

That usually goes for making types stateful, so that the internal data representation can be the same but you get a different set of methods available for each concrete marker choice.

There are a different set of methods based upon the associated types.

impl<S> ParseError<S> where S: Structure<Cause = Causes> {
    pub fn causes(&self) -> & [Cause]
}
impl<S> ParseError<S> where S: Structure<Cause = Cause> {
    pub fn cause(&self) -> &Cause
}
impl<S> ParseError<S> where S: Structure<Subject = String> {
    pub fn subject(&self) -> &str
}

cause and subject were S::Cause and S::Subject but they evaluated to the same (well, sort &String but close enough).

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

I had really hoped this would be possible:

let Result<_, ParseError<Complete>> = Pointer::parse("/ptr");

where

impl Pointer {
    pub fn parse<S: AsRef<str> + ?Sized, E: Structure>(s: &S) -> Result<&Self, ParseError<E>>
}

but not being able to set defaults on generics of functions put an end to that.

The remaining advantage to using this structure is that the api of the error remains consistent between Pointer::parse and PointerBuf::parse. I haven't put any effort into going from ParseError<WithoutInput> -> ParseError<WithInput> but ParseError<WithInput> -> ParseError<Complete> is implemented (poorly). ParseError<WithInput> -> ParseError<WithoutInput> should also be implemented.

Report or whatever would still need to exist for {assign, resolve}::Error but we can feature flag it. Then we can impl From<ParseError<Subject=String>> for Report and the same machinery to go from {assign, resolve}::Error to Report will be possible for ParseError<String>

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

There are 3 shapes that ParseError can be in (i'm not crazy about the names):

  • WithoutInput, returned from Pointer::parse: { subject: Empty, cause: Cause }
  • WithInput, returned from PointerBuf::parse: { subject: String, cause: Cause }
  • Complete, bikeshed on where it is returned: { subject: String, cause: Causes }

where Cause is formerly ParseError and Causes is Box<[Cause]>.

pub enum Cause {
    NoLeadingBackslash,
    InvalidEncoding {
        offset: usize,
        source: EncodingError,
    },
}

Empty exists so that it can impl From<&str> and impl From<String>, which are just thrown away.

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

If we had specialization, the API could be identical across variations of ParseError:

impl<S> ParseError<S> where S: Structure<Cause = Causes> {
    pub fn cause(&self) -> &Cause { &self.cause[0] }
}

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

Nevermind, I was able to get fn cause(&self) -> &Cause to be universal without it.

pub trait Causative:
    PartialEq + Sized + std::error::Error + fmt::Display + fmt::Debug + miette::Diagnostic
+    fn first(&self) -> &Cause;
impl<S: Structure> ParseError<S> {
    pub fn cause(&self) -> &Cause {
        self.cause.first()
    }
}

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

An alternative approach would be something like

struct SansInput { cause: Cause }
struct WithInput { cause: Cause, input: String }
struct Complete { causes: Vec<Cause>, input: String }

enum ParseError {
    SansInput(SansInput),
    WithInput(WithInput),
    Complete(Complete),
}

but methods which operate on the input would need to return Option. Maybe that's okay in exchange for much lower complexity.

@asmello
Copy link
Collaborator

asmello commented Nov 21, 2024

I'm still skeptical about this marker trait system, because in my experience this pattern has some serious ergonomics implications. One you kinda noticed already - lack of specialisation means we often have to implement the same function multiple times, once for each concrete marker value. This has a viral effect, because each function that is implemented concretely can't be called in generic contexts. That is, this doesn't work:

use std::marker::PhantomData;

trait State {}
struct S1;
impl State for S1 {}
struct S2;
impl State for S2 {}
struct Foo<S: State> {
    _phantom: PhantomData<S>
}

impl Foo<S1> {
    fn do_it() { }
}

impl Foo<S2> {
    fn do_it() { }
}

fn frobnicate<S: State>(foo: Foo<S>) {
    // error[E0599]: no method named `do_it` found for struct `Foo<S>`
    foo.do_it(); 
}

Playground

I'm not actually sure how your first method in Causative solved this issue? But even if it did, you still went from implementing cause 3x to implementing first 3x, which doesn't seem like much of an improvement. What if you needed last as well? Doesn't seem to scale well.

I'm also worried that all this work is just to support ParseError, and I'm not sure how much can be reused for other error types that would similarly benefit from an enrichment system.

@asmello
Copy link
Collaborator

asmello commented Nov 21, 2024

To be clear though, you are not crazy, there is some credit to this system. The state machine is a nice model for error enrichment. My concerns are just over implications for ergonomics, as well as complexity.

I'm also scared of running into edge cases with generics and associated types. Fought those hard last Sunday getting .into_owned to work when I had a very similar design to ParseError (where a field's type was an associated type). The solution ended up being to be more loose with the generic bounds on the type definition (introducing a totally decoupled second parameter) and then constraining it just in the places where the association was relevant. That led to many simplifications. I wonder if something similar might be done here.

@asmello
Copy link
Collaborator

asmello commented Nov 21, 2024

The remaining advantage to using this structure is that the api of the error remains consistent between Pointer::parse and PointerBuf::parse.

I wonder if the answer isn't to just have 3 totally separate types, but have them implement a trait where common functionality makes sense? This would give callers a common API to operate with where it matters, while still providing a way to have different methods on each stage of enrichment.

What's telling to me here is that the three flavours of ParseError end up having both distinct data types as well as a unique subset of methods. I really don't see what the generics are buying us here.

@chanced
Copy link
Owner Author

chanced commented Nov 21, 2024

I'm still skeptical about this marker trait system, because in my experience this pattern has some serious ergonomics implications.

Ah, I hadn't really considered too much the implications of infecting generics on people. Hmm.

One you kinda noticed already - lack of specialisation means we often have to implement the same function multiple times, once for each concrete marker value. This has a viral effect, because each function that is implemented concretely can't be called in generic contexts

yea, I had no idea you couldn't shadow fns like that. I discovered that through this - where initially my intention was to have cause return &[Cause] or &Cause.

I'm not actually sure how your first method in Causative solved this issue?

Oh, in this case I wanted cause to return either itself (in the case of Cause) or the first, in the case of Causes. The reasoning is that if it were not Complete, the first Cause would be returned. Adding it to the trait made it so that I didn't need to constrain on Structure::Cause.

What if you needed last as well? Doesn't seem to scale well.

I realize this is a contrived example but we wouldn't need last. Most of the distinctions between these types are whether or not there exists the input string. I think I have most of that covered but and what's remaining of finishing it shouldn't depend too much on the markers. I'm not dismissing the concern. Its valid - just giving context on the current state.

I'm also worried that all this work is just to support ParseError, and I'm not sure how much can be reused for other error types that would similarly benefit from an enrichment system.

This is true. It is confusing to have two separate mechanisms for enrichment when we have so few errors.

To be clear though, you are not crazy, there is some credit to this system. The state machine is a nice model for error enrichment. My concerns are just over implications for ergonomics, as well as complexity.

Understood and thanks. I wasn't sure if you outright hated it or not :).

I've been going around and around trying to find the right combination of complexity, ergonomics, and utility. The dual types, while awesome, definitely spike the complexity around errors.

gotta run, i'll reply to the rest when I get back.

@asmello
Copy link
Collaborator

asmello commented Nov 21, 2024

Understood and thanks. I wasn't sure if you outright hated it or not :).

Oh, I don't hate it. I'm just scared of it! 😄

Ah, I hadn't really considered too much the implications of infecting generics on people. Hmm.

I think this is probably ok for users, because they have to specify which concrete type they want out of .parse, and once you've got a concrete type it's all good. It's only if they decide to be generic themselves over S that they will start to run into trouble, but I think this would be rare. I'm mainly concerned about the cost of internal evolution.

@chanced
Copy link
Owner Author

chanced commented Nov 22, 2024

I wonder if the answer isn't to just have 3 totally separate types, but have them implement a trait where common functionality makes sense? This would give callers a common API to operate with where it matters, while still providing a way to have different methods on each stage of enrichment.

That may work. If we go that route, I'm thinking we eliminate the third option so it'll just be 2, with and without input.

What's telling to me here is that the three flavours of ParseError end up having both distinct data types as well as a unique subset of methods. I really don't see what the generics are buying us here.

Yea, we have 2 or 3 distinct structures depending on whether we keep the list. The generics keep it to one descriptive type. That's about it. Unfortunately the luster of this approach is severely diminished without being able to pick and choose the reporting style from a single set of methods (parse on both Pointer and PointerBuf) by specifying the generic.

Granted, we can still allow for picking of reporting style on parse. Ergonomics would take a serious hit in situations where the error is ambiguous though:

impl PointerBuf {
    pub fn parse<E: Structure>(s: impl Into<String>) -> Result<Self, ParseError<E>> {
        let s = s.into();
        match Validator::validate::<E::Cause>(&s) {
            Ok(()) => Ok(Self(s)),
            Err(cause) => Err(ParseError::<E>::new(cause, s)),
        }
    }
}

fn example () {
    let ptr = PointerBuf::parse("/").unwrap();
}
error[E0283]: type annotations needed
    --> src/pointer.rs:1301:15
     |
1301 |     let ptr = PointerBuf::parse("/").unwrap();
     |               ^^^^^^^^^^^^^^^^^^^^^^ cannot infer type for type parameter `E` declared on the associated function `parse`
     |
     = note: cannot satisfy `_: Structure`
     = help: the following types implement trait `Structure`:
               SansInput
               WithInput
               parse_error::Complete
note: required by a bound in `PointerBuf::parse`

Oh, I don't hate it. I'm just scared of it! 😄

Hah, I understand that. grill has become a swamp of associated types and generics. Its the only other project I have which is infested with associated types to any noticable degree. I've come to love them (associated types) but booting back up on that project takes time.

Having said all of that, I don't think this particular arrangement is horrible, especially if we seal up Causitive (with a better name) but its still mental overhead. Also, there are so few variants that it seems a bit overkill.

@chanced
Copy link
Owner Author

chanced commented Nov 22, 2024

I think I'd favor a distinct type for the ParseError with the input string. Report<ParseError> may be too generic for an explicit action.

edit: Hmm, maybe not. I'm going to go back to your PR and play with it. Perhaps just one enriched generic error is sufficient. It may help a good bit if we eliminate the possibility of multiple errors too.

@asmello
Copy link
Collaborator

asmello commented Nov 22, 2024

I think I'd favor a distinct type for the ParseError with the input string. Report<ParseError> may be too generic for an explicit action.

I'm not opposed to this, but as long as we have two versions for every type (one encoding just the cause and the other also encoding a position), I think the generic scales better. That's about it, really.

My initial jerk reaction to the generic wrapper was because in my experience generic errors tend to be painful to work with. Especially for users, as they often end up doing things we didn't foresee. But what I ended up realising is that the public API can expose only concrete types, so this isn't a problem (Report<ParseError> is a concrete type). That's why I changed my mind about the wrapper (also because the solution I came up with was relatively simple).

edit: Hmm, maybe not. I'm going to go back to your PR and play with it. Perhaps just one enriched generic error is sufficient. It may help a good bit if we eliminate the possibility of multiple errors too.

I'll try to set apart some time this weekend to try and make it work with multiple errors too. But like I said, I wouldn't oppose just sticking with concrete types either.

And I haven't fully ruled out the associated types approach, I'm just dubious of the net benefit. If you think of another motivation for it I may change my stance.

@chanced
Copy link
Owner Author

chanced commented Nov 22, 2024

I'll be awol this weekend but I think I'm leaning toward the wrapper as well. The alternative is going to be basically the equivalent of it anyway.

Basically the two options are:

enum ParseError {}
struct EnrichedParseError { input: String, error: ParseError }

or something like this to get the same API across types:

enum Cause {}
struct ParseError { cause: Cause }
struct EnrichedParseError { cause: Cause, input: String }

Both of which don't make a whole lot of sense if we have a wrapper type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants