`cddl` is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

itamarst · 2022-12-21T15:23:14Z

Consider the following schema: object = bstr. In theory, validating anything that matches that should be super-fast: read the prefix, check the remaining bytes are the right length, the end.

And indeed, for a small CBOR document (and with Python overhead!) I measure 400 nanoseconds to validate. So that's great.

However, if you pass in a large document that still matches the schema, validation is much slower. 1GB bstr takes 600 milliseconds(!) to parse, 100MB is 30ms... this is 1e6 slower, and scales with input data size. The reason: ciborium is parsing the CBOR bstr into a Vec. So that means both memory allocation and data copying that scales linearly with the size of a bstr.

Much of the time cddl only cares about the length of the data when validating bstr. And even when contents matter, a &[u8] should suffice for bstr. So this is inefficient. (Also note that there are probably similar optimization opportunities for Unicode strings, although that would require UTF-8 validation I assume so not quite as optimizable.)

I am not sure how to approach this without further research: possibly ciborium can be convinced to spit out &[u8], possibly a different CBOR parser would help, etc.. I will look into it at some point if you don't have the time; I am also happy to implement a PR given a design approach.

The text was updated successfully, but these errors were encountered:

itamarst · 2023-01-20T17:25:22Z

After further thought: given ciborium wants to read from arbitrary reader, support for &[u8] in ciborium::value::Value will be hard sell. But I have idea for potential improvement to ciborium that might help a little, so going to look into that.

itamarst · 2023-01-20T19:25:34Z

I think maybe I found a way to speed things up on ciborium side; still not sure what to do about memory usage.

anweiss · 2023-08-08T19:16:13Z

Thanks for reporting this @itamarst. Will leave this open since this seems to be attributed to ciborium.

itamarst mentioned this issue Jan 10, 2023

Fix mutable uploads over HTTP above a certain size tahoe-lafs/tahoe-lafs#1244

Merged

anweiss added this to the Backlog milestone Aug 8, 2023

anweiss added the performance label Aug 8, 2023

anweiss added the help wanted Extra attention is needed label Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cddl` is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

`cddl` is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

itamarst commented Dec 21, 2022 •

edited

Loading

itamarst commented Jan 20, 2023

itamarst commented Jan 20, 2023

anweiss commented Aug 8, 2023

cddl is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

cddl is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

Comments

itamarst commented Dec 21, 2022 • edited Loading

itamarst commented Jan 20, 2023

itamarst commented Jan 20, 2023

anweiss commented Aug 8, 2023

`cddl` is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

`cddl` is slow (and possibly memory inefficient) in some situations, due to CBOR parsing #167

itamarst commented Dec 21, 2022 •

edited

Loading