Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF8String::normalize() #4715

Merged
merged 6 commits into from
Sep 27, 2024
Merged

Add UTF8String::normalize() #4715

merged 6 commits into from
Sep 27, 2024

Conversation

obiltschnig
Copy link
Member

Implements #4710, supports internal utf8proc or external library if building with POCO_UNBUNDLED.

@obiltschnig obiltschnig added this to the Release 1.14.0 milestone Sep 27, 2024
@obiltschnig obiltschnig self-assigned this Sep 27, 2024
@obiltschnig obiltschnig changed the title 4710 utf8proc Add UTF8String::normalize(), implemented with utf8proc (#4710) Sep 27, 2024
# define UINT16_MAX 65535U
#endif

#include "utf8proc_data.c"

Check notice

Code scanning / CodeQL

Include header files only Note

The #include pre-processor directive should only be used to include header files.
Comment on lines +263 to +288
(lbc == UTF8PROC_BOUNDCLASS_START) ? true : // GB1
(lbc == UTF8PROC_BOUNDCLASS_CR && // GB3
tbc == UTF8PROC_BOUNDCLASS_LF) ? false : // ---
(lbc >= UTF8PROC_BOUNDCLASS_CR && lbc <= UTF8PROC_BOUNDCLASS_CONTROL) ? true : // GB4
(tbc >= UTF8PROC_BOUNDCLASS_CR && tbc <= UTF8PROC_BOUNDCLASS_CONTROL) ? true : // GB5
(lbc == UTF8PROC_BOUNDCLASS_L && // GB6
(tbc == UTF8PROC_BOUNDCLASS_L || // ---
tbc == UTF8PROC_BOUNDCLASS_V || // ---
tbc == UTF8PROC_BOUNDCLASS_LV || // ---
tbc == UTF8PROC_BOUNDCLASS_LVT)) ? false : // ---
((lbc == UTF8PROC_BOUNDCLASS_LV || // GB7
lbc == UTF8PROC_BOUNDCLASS_V) && // ---
(tbc == UTF8PROC_BOUNDCLASS_V || // ---
tbc == UTF8PROC_BOUNDCLASS_T)) ? false : // ---
((lbc == UTF8PROC_BOUNDCLASS_LVT || // GB8
lbc == UTF8PROC_BOUNDCLASS_T) && // ---
tbc == UTF8PROC_BOUNDCLASS_T) ? false : // ---
(tbc == UTF8PROC_BOUNDCLASS_EXTEND || // GB9
tbc == UTF8PROC_BOUNDCLASS_ZWJ || // ---
tbc == UTF8PROC_BOUNDCLASS_SPACINGMARK || // GB9a
lbc == UTF8PROC_BOUNDCLASS_PREPEND) ? false : // GB9b
(lbc == UTF8PROC_BOUNDCLASS_E_ZWG && // GB11 (requires additional handling below)
tbc == UTF8PROC_BOUNDCLASS_EXTENDED_PICTOGRAPHIC) ? false : // ----
(lbc == UTF8PROC_BOUNDCLASS_REGIONAL_INDICATOR && // GB12/13 (requires additional handling below)
tbc == UTF8PROC_BOUNDCLASS_REGIONAL_INDICATOR) ? false : // ----
true; // GB999

Check notice

Code scanning / CodeQL

Complex condition Note

Complex condition: too many logical operations in this expression.
utf8proc_int32_t uc;
for (rpos = 0; rpos < length; rpos++) {
uc = buffer[rpos];
if (uc == 0x000D && rpos < length-1 && buffer[rpos+1] == 0x000A) rpos++;

Check notice

Code scanning / CodeQL

For loop variable changed in body Note

Loop counters should not be modified in the body of the
loop
.
@obiltschnig obiltschnig changed the title Add UTF8String::normalize(), implemented with utf8proc (#4710) Add UTF8String::normalize() Sep 27, 2024
@obiltschnig obiltschnig merged commit d911539 into main Sep 27, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant