Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for improving build_chain error specificity #79

Closed
Tracked by #57
cpu opened this issue Jun 12, 2023 · 3 comments · Fixed by #89
Closed
Tracked by #57

Options for improving build_chain error specificity #79

cpu opened this issue Jun 12, 2023 · 3 comments · Fixed by #89
Assignees

Comments

@cpu
Copy link
Member

cpu commented Jun 12, 2023

Pulling out a discussion issue from #66 for more targeted discussion:

Error messages for revocation related events are incorrect. All error states are reported as "UnknownIssuer". I will pull out a separate issue for discussion on this but the short explanation is that the existing path building code offers no opportunity to return a unique error, every path building error case is swallowed by the loop_while_non_fatal_error helper and turned into a Error::UnknownIssuer result. There are some comments that indicate "fatal" errors should be treated separately, but in practice all errors are considered non-fatal. For now I've left this as-is and the unit tests have TODOs to check for more granular errors.

Spelunking around for historic context, I've found this was an explicit design decision made by Brian: the goal was for webpki to be as minimal as possible and yeild only a binary result: "yes - valid chain to a trust root was constructed" or "no - invalid issuer" (e.g. see comments in briansmith/webpki#183). Later, it seems his perspective changed and he expressed interest in renaming InvalidIssuer and some strategies for surfacing more granular errors were put on the table for evaluation.

I agree with the new directions Brian suggested. I think we should separately consider:

  1. Renaming UnknownIssuer to something that more generically suggests path building failure (Rename and better document the UnknownIssuer error briansmith/webpki#221)
  2. Trying to surface more specific errors where possible (Provide a hint as to the likely cause of failure to find a valid issuer briansmith/webpki#206).

For point 1, I think we mostly need to bikeshed a new name, so this issue is more for discussing approaches to point 2. There have been several suggestions made for this:

  1. Following the approach Firefox/libpkix use: if all attempts to build a chain yield the same error, return that error.
  2. Updating verify_cert.rs's loop_while_non_fatal_error fn to return the last error observed (implemented in Improve error message when trying to build_chain briansmith/webpki#140)
  3. Updating verify_cert.rs's loop_while_non_fatal_error fn to heuristically score errors based on their relative importance, returning the most important error observed during path building (implemented in Better error in verify_is_valid_tls_server_cert briansmith/webpki#188)

I've found it hard to map the described Firefox approach to the implementation (both of webpki and in the Firefox codebase). It sounds promising, but the path to implementation is less clear. This approach feels trickier to nail down without allocation - would it require building a chain of error refs in Cert ala the EndEntityOrCa links?

The "return last error" approach from 140 is simple, but not sufficient for surfacing errors in all cases we care about. For instance if an intermediate is revoked there will be further path building operations that ultimately squash the error with UnknownIssuer even with this suggested error handling strategy in place.

The approach in 188 is better suited to our needs, but also requires that we heuristically score errors based on their relative importance. That's a little bit cumbersome, but with care it does yield what I consider to be the most important errors when run across our existing testcases (the majority of which are currently asserting UnknownIssuer).

I'm leaning towards adopting the "heuristic" approach, but I'm eager to here what others think. If it would be helpful, I can stage a branch with the diff updated for the fork and the affected unit tests updated. I think this is likely a blocker for landing proper CRL support, it will be a very poor user experience to yield "UnknownIssuer" when the problem is a revoked end-entity certificate (as one example).

@cpu cpu self-assigned this Jun 12, 2023
@djc
Copy link
Member

djc commented Jun 13, 2023

Heuristics sound good to me as a way of trying to find the most specific error -- though I don't think you've explicitly stated what the desired optimum for our heuristic is? I'd guess something like "most specific error".

Another challenge here is surfacing the error in a way that people not as well-versed in web PKI jargon can understand it. For example, ideally we'd avoid the term "path building" in the public documentation.

@cpu
Copy link
Member Author

cpu commented Jun 13, 2023

I'd guess something like "most specific error".

Yes, that's what I was thinking 👍

Another challenge here is surfacing the error in a way that people not as well-versed in web PKI jargon can understand it. For example, ideally we'd avoid the term "path building" in the public documentation.

That's a good point. Mentally I've been modelling webpki as somewhat of a low-level component that isn't expected to be user-facing in and of itself. In that context I think it's reasonable to be using somewhat specialized jargon that I would expect a higher level crate like Rustls to adapt to user-facing errors. Do you think some of the work of translating to normal-people-speak should be handled by Webpki? I'm not opposed to that, but I also think it might be separate from the concern of how to surface more specific errors.

@djc
Copy link
Member

djc commented Jun 14, 2023

That's a good point. Mentally I've been modelling webpki as somewhat of a low-level component that isn't expected to be user-facing in and of itself. In that context I think it's reasonable to be using somewhat specialized jargon that I would expect a higher level crate like Rustls to adapt to user-facing errors. Do you think some of the work of translating to normal-people-speak should be handled by Webpki? I'm not opposed to that, but I also think it might be separate from the concern of how to surface more specific errors.

I guess in my mind there is still a fair bit of leakage from webpki to rustls, but maybe my internal idea of this stuff hasn't fully caught up to the new reality of error handling. If we think rustls is in a good position to "translate" errors to something end user-friendly, then leaking some jargon from webpki is probably fine. (It's definitely less important than yielding the most specific error in the first place.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants