cddl
is slow (and possibly memory inefficient) in some situations, due to CBOR parsing
#167
Labels
Milestone
Consider the following schema:
object = bstr
. In theory, validating anything that matches that should be super-fast: read the prefix, check the remaining bytes are the right length, the end.And indeed, for a small CBOR document (and with Python overhead!) I measure 400 nanoseconds to validate. So that's great.
However, if you pass in a large document that still matches the schema, validation is much slower. 1GB bstr takes 600 milliseconds(!) to parse, 100MB is 30ms... this is 1e6 slower, and scales with input data size. The reason:
ciborium
is parsing the CBORbstr
into aVec
. So that means both memory allocation and data copying that scales linearly with the size of abstr
.Much of the time
cddl
only cares about the length of the data when validatingbstr
. And even when contents matter, a&[u8]
should suffice forbstr
. So this is inefficient. (Also note that there are probably similar optimization opportunities for Unicode strings, although that would require UTF-8 validation I assume so not quite as optimizable.)I am not sure how to approach this without further research: possibly
ciborium
can be convinced to spit out&[u8]
, possibly a different CBOR parser would help, etc.. I will look into it at some point if you don't have the time; I am also happy to implement a PR given a design approach.The text was updated successfully, but these errors were encountered: