Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cddl is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

Open
itamarst opened this issue Dec 21, 2022 · 3 comments
Labels
help wanted Extra attention is needed performance
Milestone

Comments

@itamarst
Copy link
Contributor

itamarst commented Dec 21, 2022

Consider the following schema: object = bstr. In theory, validating anything that matches that should be super-fast: read the prefix, check the remaining bytes are the right length, the end.

And indeed, for a small CBOR document (and with Python overhead!) I measure 400 nanoseconds to validate. So that's great.

However, if you pass in a large document that still matches the schema, validation is much slower. 1GB bstr takes 600 milliseconds(!) to parse, 100MB is 30ms... this is 1e6 slower, and scales with input data size. The reason: ciborium is parsing the CBOR bstr into a Vec. So that means both memory allocation and data copying that scales linearly with the size of a bstr.

Much of the time cddl only cares about the length of the data when validating bstr. And even when contents matter, a &[u8] should suffice for bstr. So this is inefficient. (Also note that there are probably similar optimization opportunities for Unicode strings, although that would require UTF-8 validation I assume so not quite as optimizable.)

I am not sure how to approach this without further research: possibly ciborium can be convinced to spit out &[u8], possibly a different CBOR parser would help, etc.. I will look into it at some point if you don't have the time; I am also happy to implement a PR given a design approach.

@itamarst
Copy link
Contributor Author

After further thought: given ciborium wants to read from arbitrary reader, support for &[u8] in ciborium::value::Value will be hard sell. But I have idea for potential improvement to ciborium that might help a little, so going to look into that.

@itamarst
Copy link
Contributor Author

I think maybe I found a way to speed things up on ciborium side; still not sure what to do about memory usage.

@anweiss anweiss added this to the Backlog milestone Aug 8, 2023
@anweiss
Copy link
Owner

anweiss commented Aug 8, 2023

Thanks for reporting this @itamarst. Will leave this open since this seems to be attributed to ciborium.

@anweiss anweiss added the help wanted Extra attention is needed label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed performance
Projects
None yet
Development

No branches or pull requests

2 participants