Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: fix and improve utf8 utilities #1045

Closed
wants to merge 3 commits into from
Closed

Conversation

rept1d
Copy link
Contributor

@rept1d rept1d commented Dec 19, 2023

This PR solves the following problems:

  • dpp::utf8substr is just broken: dpp::utf8substr("abcdefg", 2, 2) returns "cdef". Probably unnoticed due to very limited variety of usage scenarios.
  • dpp::utf8substr and dpp::utf8len claim to have a specific result (empty string or zero, respectively) if the supplied string is not a valid UTF-8 sequence. In reality though, not only do they not verify if the string is valid UTF-8, they actually invoke UB for some inputs (e.g. dpp::utf8len("\xC0\xC0")) due to reading past the end of the string.
  • The implementations of these functions are convoluted (use of goto, defining lots of variables in one line etc.)
  • They only work with std::string's and not std::string_views. In particular, dp::utf8substr returns a new std::string, which causes allocations and copies that could be avoided in some cases.

What I did:

  • Dropped the "guarantees" on what exactly these functions return in case of non-UTF-8 strings. We shouldn't pretend that they are provided. Now we claim that the result is "unspecified" (so no UB).
  • Introduced a separate function called utf8subview which returns std::string_view instead of std::string, and made use of it in several places across the code base, where applicable.
  • Replaced const std::string& with std::string_view in parameter declarations of these functions.
  • Replaced std::string::size_type with size_t in parameter declarations of these functions because IMO it only makes everything unnecessarily verbose, especially if we assume that std::string::size_type might be different from std::string_view::size_type. It's almost never used in other places anyway, so why bother in this one in particular?
  • Completely rewrote the implementations of both functions. One important difference in their behavior is that now they don't treat null terminators in the middle of strings as their end markers. These are now no different from any other ASCII character. It's also important that we don't rely on their presence at the end because std::string_views are not guaranteed to be null-terminated.

Code change checklist

  • I have ensured that all methods and functions are fully documented using doxygen style comments.
  • My code follows the coding style guide.
  • I tested that my change works before raising the PR.
  • I have ensured that I did not break any existing API calls.
  • I have not built my pull request using AI, a static analysis tool or similar without any human oversight.

Copy link

netlify bot commented Dec 19, 2023

Deploy Preview for dpp-dev ready!

Name Link
🔨 Latest commit 64c2b72
🔍 Latest deploy log https://app.netlify.com/sites/dpp-dev/deploys/6581dd432dc51700082b40f5
😎 Deploy Preview https://deploy-preview-1045--dpp-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions github-actions bot added documentation Improvements or additions to documentation code Improvements or additions to code. labels Dec 19, 2023
@rept1d rept1d closed this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Improvements or additions to code. documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant