refactor: fix and improve utf8 utilities #1046
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR solves the following problems:
dpp::utf8substr
is just broken:dpp::utf8substr("abcdefg", 2, 2)
returns"cdef"
. Probably unnoticed due to a very limited variety of usage scenarios.dpp::utf8substr
anddpp::utf8len
claim to have a specific result (empty string or zero, respectively) if the supplied string is not a valid UTF-8 sequence. In reality though, not only do they not verify if the string is valid UTF-8, they actually invoke UB for some inputs (e.g.dpp::utf8len("\xC0\xC0")
) due to reading past the end of the string.goto
, defining lots of variables in one line etc.).std::string
s and notstd::string_view
s. In particular,dp::utf8substr
returns a newstd::string
, which causes allocations and copies that could be avoided in some cases.What I did:
utf8subview
which returnsstd::string_view
instead ofstd::string
, and made use of it in several places across the code base, where applicable.const std::string&
withstd::string_view
in parameter declarations of these functions.std::string::size_type
withsize_t
in parameter declarations of these functions because IMO it only makes everything unnecessarily verbose, especially if we assume thatstd::string::size_type
might be different fromstd::string_view::size_type
. It's almost never used in other places anyway, so why bother in this one in particular?std::string_view
s are not guaranteed to be null-terminated.Code change checklist