Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: fix and improve utf8 utilities #1046

Merged
merged 4 commits into from
Dec 20, 2023
Merged

Conversation

rept1d
Copy link
Contributor

@rept1d rept1d commented Dec 19, 2023

This PR solves the following problems:

  • dpp::utf8substr is just broken: dpp::utf8substr("abcdefg", 2, 2) returns "cdef". Probably unnoticed due to a very limited variety of usage scenarios.
  • dpp::utf8substr and dpp::utf8len claim to have a specific result (empty string or zero, respectively) if the supplied string is not a valid UTF-8 sequence. In reality though, not only do they not verify if the string is valid UTF-8, they actually invoke UB for some inputs (e.g. dpp::utf8len("\xC0\xC0")) due to reading past the end of the string.
  • The implementations of these functions are convoluted (use of goto, defining lots of variables in one line etc.).
  • They only work with std::strings and not std::string_views. In particular, dp::utf8substr returns a new std::string, which causes allocations and copies that could be avoided in some cases.

What I did:

  • Dropped the "guarantees" on what exactly these functions return in case of non-UTF-8 strings. We shouldn't pretend that they are provided. Now we claim that the result is "unspecified" (so no UB).
  • Introduced a separate function called utf8subview which returns std::string_view instead of std::string, and made use of it in several places across the code base, where applicable.
  • Replaced const std::string& with std::string_view in parameter declarations of these functions.
  • Replaced std::string::size_type with size_t in parameter declarations of these functions because IMO it only makes everything unnecessarily verbose, especially if we assume that std::string::size_type might be different from std::string_view::size_type. It's almost never used in other places anyway, so why bother in this one in particular?
  • Completely rewrote the implementations of both functions. One important difference in their behavior is that now they don't treat null terminators in the middle of strings as their end markers. These are now no different from any other ASCII character. It's also important that we don't rely on their presence at the end because std::string_views are not guaranteed to be null-terminated.

Code change checklist

  • I have ensured that all methods and functions are fully documented using doxygen style comments.
  • My code follows the coding style guide.
  • I tested that my change works before raising the PR.
  • I have ensured that I did not break any existing API calls.
  • I have not built my pull request using AI, a static analysis tool or similar without any human oversight.

@github-actions github-actions bot added documentation Improvements or additions to documentation code Improvements or additions to code. labels Dec 19, 2023
Copy link

netlify bot commented Dec 19, 2023

Deploy Preview for dpp-dev ready!

Name Link
🔨 Latest commit d99070a
🔍 Latest deploy log https://app.netlify.com/sites/dpp-dev/deploys/6581e5d0bfd12000070d34cd
😎 Deploy Preview https://deploy-preview-1046--dpp-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@Mishura4 Mishura4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we should add unit tests for all of this as well, though this can be a separate commit

@Jaskowicz1
Copy link
Contributor

Super detailed PR description, I love it!

@braindigitalis braindigitalis merged commit 8144480 into brainboxdotcc:dev Dec 20, 2023
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Improvements or additions to code. documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants