Options for improving build_chain error specificity #79

cpu · 2023-06-12T21:00:45Z

Pulling out a discussion issue from #66 for more targeted discussion:

Error messages for revocation related events are incorrect. All error states are reported as "UnknownIssuer". I will pull out a separate issue for discussion on this but the short explanation is that the existing path building code offers no opportunity to return a unique error, every path building error case is swallowed by the loop_while_non_fatal_error helper and turned into a Error::UnknownIssuer result. There are some comments that indicate "fatal" errors should be treated separately, but in practice all errors are considered non-fatal. For now I've left this as-is and the unit tests have TODOs to check for more granular errors.

Spelunking around for historic context, I've found this was an explicit design decision made by Brian: the goal was for webpki to be as minimal as possible and yeild only a binary result: "yes - valid chain to a trust root was constructed" or "no - invalid issuer" (e.g. see comments in briansmith/webpki#183). Later, it seems his perspective changed and he expressed interest in renaming InvalidIssuer and some strategies for surfacing more granular errors were put on the table for evaluation.

I agree with the new directions Brian suggested. I think we should separately consider:

Renaming UnknownIssuer to something that more generically suggests path building failure (Rename and better document the UnknownIssuer error briansmith/webpki#221)
Trying to surface more specific errors where possible (Provide a hint as to the likely cause of failure to find a valid issuer briansmith/webpki#206).

For point 1, I think we mostly need to bikeshed a new name, so this issue is more for discussing approaches to point 2. There have been several suggestions made for this:

Following the approach Firefox/libpkix use: if all attempts to build a chain yield the same error, return that error.
Updating verify_cert.rs's loop_while_non_fatal_error fn to return the last error observed (implemented in Improve error message when trying to build_chain briansmith/webpki#140)
Updating verify_cert.rs's loop_while_non_fatal_error fn to heuristically score errors based on their relative importance, returning the most important error observed during path building (implemented in Better error in verify_is_valid_tls_server_cert briansmith/webpki#188)

I've found it hard to map the described Firefox approach to the implementation (both of webpki and in the Firefox codebase). It sounds promising, but the path to implementation is less clear. This approach feels trickier to nail down without allocation - would it require building a chain of error refs in Cert ala the EndEntityOrCa links?

The "return last error" approach from 140 is simple, but not sufficient for surfacing errors in all cases we care about. For instance if an intermediate is revoked there will be further path building operations that ultimately squash the error with UnknownIssuer even with this suggested error handling strategy in place.

The approach in 188 is better suited to our needs, but also requires that we heuristically score errors based on their relative importance. That's a little bit cumbersome, but with care it does yield what I consider to be the most important errors when run across our existing testcases (the majority of which are currently asserting UnknownIssuer).

I'm leaning towards adopting the "heuristic" approach, but I'm eager to here what others think. If it would be helpful, I can stage a branch with the diff updated for the fork and the affected unit tests updated. I think this is likely a blocker for landing proper CRL support, it will be a very poor user experience to yield "UnknownIssuer" when the problem is a revoked end-entity certificate (as one example).

The text was updated successfully, but these errors were encountered:

djc · 2023-06-13T07:36:54Z

Heuristics sound good to me as a way of trying to find the most specific error -- though I don't think you've explicitly stated what the desired optimum for our heuristic is? I'd guess something like "most specific error".

Another challenge here is surfacing the error in a way that people not as well-versed in web PKI jargon can understand it. For example, ideally we'd avoid the term "path building" in the public documentation.

cpu · 2023-06-13T14:18:04Z

I'd guess something like "most specific error".

Yes, that's what I was thinking 👍

Another challenge here is surfacing the error in a way that people not as well-versed in web PKI jargon can understand it. For example, ideally we'd avoid the term "path building" in the public documentation.

That's a good point. Mentally I've been modelling webpki as somewhat of a low-level component that isn't expected to be user-facing in and of itself. In that context I think it's reasonable to be using somewhat specialized jargon that I would expect a higher level crate like Rustls to adapt to user-facing errors. Do you think some of the work of translating to normal-people-speak should be handled by Webpki? I'm not opposed to that, but I also think it might be separate from the concern of how to surface more specific errors.

djc · 2023-06-14T08:35:21Z

That's a good point. Mentally I've been modelling webpki as somewhat of a low-level component that isn't expected to be user-facing in and of itself. In that context I think it's reasonable to be using somewhat specialized jargon that I would expect a higher level crate like Rustls to adapt to user-facing errors. Do you think some of the work of translating to normal-people-speak should be handled by Webpki? I'm not opposed to that, but I also think it might be separate from the concern of how to surface more specific errors.

I guess in my mind there is still a fair bit of leakage from webpki to rustls, but maybe my internal idea of this stuff hasn't fully caught up to the new reality of error handling. If we think rustls is in a good position to "translate" errors to something end user-friendly, then leaking some jargon from webpki is probably fine. (It's definitely less important than yielding the most specific error in the first place.)

cpu self-assigned this Jun 12, 2023

cpu mentioned this issue Jun 12, 2023

Continued webpki CRL support. #66

Merged

cpu mentioned this issue Jun 13, 2023

Add CRL processing KeyUsage validation. #82

Merged

This was referenced Jun 15, 2023

CRL support tasks #57

Closed

return most specific error from path building failures #89

Merged

cpu closed this as completed in #89 Jun 19, 2023

ctz mentioned this issue Jun 30, 2023

rustls incorrectly reports UnknownIssuer instead of Unsupported Signature rustls/rustls#1335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for improving build_chain error specificity #79

Options for improving build_chain error specificity #79

cpu commented Jun 12, 2023

djc commented Jun 13, 2023

cpu commented Jun 13, 2023

djc commented Jun 14, 2023

Options for improving build_chain error specificity #79

Options for improving build_chain error specificity #79

Comments

cpu commented Jun 12, 2023

djc commented Jun 13, 2023

cpu commented Jun 13, 2023

djc commented Jun 14, 2023