Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance of
ncodeunits(::Char)
(JuliaLang#54001)
This improves performance of `ncodeunits(::Char)` by simply counting the number of non-zero bytes (except for `\0`, which is encoded as all zero bytes). For a performance comparison, see [this gist]( https://gist.github.com/Seelengrab/ebb02d4b8d754700c2869de8daf88cad); there's an up to 10x improvement here for collections of `Char`, with a minor improvement for single `Char` (with much smaller spread). The version in this PR is called `nbytesencoded` in the benchmarks. Correctness has been verified with Supposition.jl, using the existing implementation as an oracle: ```julia julia> using Supposition julia> const chars = Data.Characters() julia> @check max_examples=1_000_000 function bytesenc(i=Data.Integers{UInt32}()) c = reinterpret(Char, i) ncodeunits(c) == nbytesdiv(c) end; Test Summary: | Pass Total Time bytesenc | 1 1 1.0s julia> ncodeunits('\0') == nbytesencoded('\0') true ``` Let's see if CI agrees! Notably, neither the existing nor the new implementation check whether the given `Char` is valid or not, since the only thing that matters is how many bytes are written out. --------- Co-authored-by: Sukera <[email protected]>
- Loading branch information