Skip to content

Commit

Permalink
ICU 74.1 patches
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed Nov 5, 2023
1 parent cb766b5 commit f81beab
Show file tree
Hide file tree
Showing 18 changed files with 385 additions and 358 deletions.
11 changes: 5 additions & 6 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ install.packages("stringi")
```

However, due to the overwhelming complexity of the ICU4C library,
upon which *stringi* is based, and the colourful diversity of environments
upon which *stringi* is based, and the diversity of environments
it operates on, you might still experience a few issues.
Hopefully, they can be resolved with the help of this short manual.

Expand All @@ -18,7 +18,9 @@ Below we also describe some available build process tweaks.

The stringi package depends on the ICU4C >= 61 library.

If we install the package from sources and one of the following is true:
ICU will be built together with stringi based on the customised
ICU4C 74.1 source bundle that is shipped with the package
if we install the package from sources and one of the following is true:

* this requirement is not met (check out <https://icu.unicode.org/download>,
the `libicu-devel` package on Fedora/CentOS/OpenSUSE,
Expand All @@ -31,10 +33,7 @@ If we install the package from sources and one of the following is true:
argument, or the `STRINGI_DISABLE_PKG_CONFIG` environment variable
is set to non-zero or
`install.packages("stringi", configure.args="--disable-pkg-config")`
is executed,

then ICU will be built together with stringi based on the customised
ICU4C 74.1 source bundle that is shipped with the package.
is executed.

> Actually, to get the most out of stringi, you are strongly encouraged to rely
> on our ICU4C bundle. This ensures maximum portability across all platforms
Expand Down
11 changes: 6 additions & 5 deletions R/ICU_settings.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,11 @@ stri_info <- function(short = FALSE)
stopifnot(is.logical(short), length(short) == 1)

info <- .Call(C_stri_info)
if (info$Charset.native$Name.friendly != "UTF-8") {
loclist <- stri_locale_list()
locale <- info$Locale$Name
charset <- info$Charset.native$Name.friendly

if (charset != "UTF-8") {
if (!identical(info$Charset.native$ASCII.subset, TRUE))
warning(stri_paste("Your native character encoding is not a superset of US-ASCII. ",
"Consider switching to UTF-8."))
Expand All @@ -78,16 +82,13 @@ stri_info <- function(short = FALSE)
"Consider switching to UTF-8."))
}

loclist <- stri_locale_list()
if (!(info$Locale$Name %in% loclist))
if (!(locale %in% loclist))
warning(stri_paste("Your current locale is not on the list of ",
"available locales; see stri_locale_list(). ",
"Some functions may not work properly. "))

if (!short)
return(info) else {
locale <- info$Locale$Name
charset <- info$Charset.native$Name.friendly
return(sprintf("stringi_%s (%s.%s; ICU4C %s [%s%s]; Unicode %s)", as.character(packageVersion("stringi")),
locale, charset, info$ICU.version, if (info$ICU.system) "system" else "bundle",
if (info$ICU.UTF8) "#U_CHARSET_IS_UTF8" else "", info$Unicode.version))
Expand Down
53 changes: 29 additions & 24 deletions src/icu74/common/locmap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
*
* Date Name Description
* 3/11/97 aliu Fixed off-by-one bug in assignment operator. Added
* setId() method and safety check against
* setId() method and safety check against
* MAX_ID_LENGTH.
* 04/23/99 stephen Added C wrapper for convertToPosix.
* 09/18/00 george Removed the memory leaks.
Expand Down Expand Up @@ -118,7 +118,7 @@ static const ILcidPosixElement locmap_ ## id [] =
// Keep static locale variables inside the function so that
// it can be created properly during static init.
//
// Note: This table should be updated periodically. Check the [MS-LCID] Windows Language Code Identifier
// Note: This table should be updated periodically. Check the [MS-LCID] Windows Language Code Identifier
// (LCID) Reference defined at https://msdn.microsoft.com/en-us/library/cc233965.aspx
//
// Microsoft is moving away from LCID in favor of locale name as of Vista. This table needs to be
Expand All @@ -132,7 +132,7 @@ static const ILcidPosixElement locmap_ ## id [] =
////////////////////////////////////////////
*/

// TODO: For Windows ideally this table would be a list of exceptions rather than a complete list as
// TODO: For Windows ideally this table would be a list of exceptions rather than a complete list as
// LocaleNameToLCID and LCIDToLocaleName provide 90% of these.

ILCID_POSIX_ELEMENT_ARRAY(0x0436, af, af_ZA)
Expand Down Expand Up @@ -524,7 +524,7 @@ ILCID_POSIX_SUBTABLE(nl) {
/* The "no" locale split into nb and nn. By default in ICU, "no" is nb.*/
// TODO: Not all of these are needed on Windows, but I don't know how ICU treats preferred ones here.
ILCID_POSIX_SUBTABLE(no) {
{0x14, "no"}, /* really nb_NO - actually Windows differentiates between neutral (no region) and specific (with region) */
{0x14, "no"}, /* really nb_NO - actually Windows differentiates between neutral (no region) and specific (with region) */
{0x7c14, "nb"}, /* really nb */
{0x0414, "nb_NO"}, /* really nb_NO. Keep first in the 414 list. */
{0x0414, "no_NO"}, /* really nb_NO */
Expand Down Expand Up @@ -1029,24 +1029,7 @@ getPosixID(const ILcidPosixMap *this_0, uint32_t hostID)
//
/////////////////////////////////////
*/
#if U_PLATFORM_HAS_WIN32_API && UCONFIG_USE_WINDOWS_LCID_MAPPING_API
/*
* Various language tags needs to be changed:
* quz -> qu
* prs -> fa
*/
#define FIX_LANGUAGE_ID_TAG(buffer, len) \
if (len >= 3) { \
if (buffer[0] == 'q' && buffer[1] == 'u' && buffer[2] == 'z') {\
buffer[2] = 0; \
uprv_strcat(buffer, buffer+3); \
} else if (buffer[0] == 'p' && buffer[1] == 'r' && buffer[2] == 's') {\
buffer[0] = 'f'; buffer[1] = 'a'; buffer[2] = 0; \
uprv_strcat(buffer, buffer+3); \
} \
}

#endif

U_CAPI int32_t
uprv_convertToPosix(uint32_t hostid, char *posixID, int32_t posixIDCapacity, UErrorCode* status)
Expand Down Expand Up @@ -1102,8 +1085,30 @@ uprv_convertToPosix(uint32_t hostid, char *posixID, int32_t posixIDCapacity, UEr
break;
}
}
// TODO: Need to understand this better, why isn't it an alias?
FIX_LANGUAGE_ID_TAG(locName, tmpLen);

/*
* Various language tags needs to be changed:
* quz -> qu
* prs -> fa
*/
if (tmpLen >= 3) {
if (locName[0] == 'q' && locName[1] == 'u' && locName[2] == 'z') {
// locName[2] = 0;
// uprv_strcat(locName, locName+3);
for (i = 2; i < LOCALE_NAME_MAX_LENGTH-1; i++)
locName[i] = locName[i+1];
locName[LOCALE_NAME_MAX_LENGTH-1] = 0;
} else if (locName[0] == 'p' && locName[1] == 'r' && locName[2] == 's') {
locName[0] = 'f'; locName[1] = 'a';
// locName[2] = 0;
// uprv_strcat(locName, locName+3);
for (i = 2; i < LOCALE_NAME_MAX_LENGTH-1; i++)
locName[i] = locName[i+1];
locName[LOCALE_NAME_MAX_LENGTH-1] = 0;
}
}


pPosixID = locName;
}
}
Expand Down Expand Up @@ -1277,7 +1282,7 @@ uprv_convertToLCID(const char *langID, const char* posixID, UErrorCode* status)

mid = (high+low) >> 1; /*Finds median*/

if (mid == oldmid)
if (mid == oldmid)
break;

compVal = uprv_strcmp(langID, gPosixIDmap[mid].regionMaps->posixID);
Expand Down
4 changes: 2 additions & 2 deletions src/icu74/common/putil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2105,9 +2105,9 @@ getCodepageFromPOSIXID(const char *localeName, char * buffer, int32_t buffCapaci

if (localeName != nullptr && (name = (uprv_strchr(localeName, '.'))) != nullptr) {
size_t localeCapacity = uprv_min(sizeof(localeBuf), (name-localeName)+1);
uprv_strncpy(localeBuf, localeName, localeCapacity);
uprv_strncpy(localeBuf, localeName, localeCapacity-1);
name = uprv_strncpy(buffer, name+1, buffCapacity-1);
localeBuf[localeCapacity-1] = 0; /* ensure NUL termination */
name = uprv_strncpy(buffer, name+1, buffCapacity);
buffer[buffCapacity-1] = 0; /* ensure NUL termination */
if ((variant = const_cast<char *>(uprv_strchr(name, '@'))) != nullptr) {
*variant = 0;
Expand Down
28 changes: 14 additions & 14 deletions src/icu74/common/ubiditransform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ ubiditransform_close(UBiDiTransform *pBiDiTransform)

/**
* Performs Bidi resolution of text.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -135,7 +135,7 @@ action_resolve(UBiDiTransform *pTransform, UErrorCode *pErrorCode)

/**
* Performs basic reordering of text (Logical -> Visual LTR).
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -155,7 +155,7 @@ action_reorder(UBiDiTransform *pTransform, UErrorCode *pErrorCode)

/**
* Sets "inverse" mode on the <code>UBiDi</code> object.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -174,7 +174,7 @@ action_setInverse(UBiDiTransform *pTransform, UErrorCode *pErrorCode)
/**
* Sets "runs only" reordering mode indicating a Logical LTR <-> Logical RTL
* transformation.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -191,7 +191,7 @@ action_setRunsOnly(UBiDiTransform *pTransform, UErrorCode *pErrorCode)

/**
* Performs string reverse.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -212,7 +212,7 @@ action_reverse(UBiDiTransform *pTransform, UErrorCode *pErrorCode)
* Applies a new value to the text that serves as input at the current
* processing step. This value is identical to the original one when we begin
* the processing, but usually changes as the transformation progresses.
*
*
* @param pTransform A pointer to the <code>UBiDiTransform</code> structure.
* @param newSrc A pointer whose value is to be used as input text.
* @param newLength A length of the new text in <code>char16_t</code>s.
Expand Down Expand Up @@ -248,7 +248,7 @@ updateSrc(UBiDiTransform *pTransform, const char16_t *newSrc, uint32_t newLength

/**
* Calls a lower level shaping function.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param options Shaping options.
* @param pErrorCode Pointer to the error code value.
Expand All @@ -263,7 +263,7 @@ doShape(UBiDiTransform *pTransform, uint32_t options, UErrorCode *pErrorCode)

/**
* Performs digit and letter shaping.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand Down Expand Up @@ -293,7 +293,7 @@ action_shapeArabic(UBiDiTransform *pTransform, UErrorCode *pErrorCode)

/**
* Performs character mirroring.
*
*
* @param pTransform Pointer to the <code>UBiDiTransform</code> structure.
* @param pErrorCode Pointer to the error code value.
*
Expand All @@ -314,10 +314,10 @@ action_mirror(UBiDiTransform *pTransform, UErrorCode *pErrorCode)
}
do {
UBool isOdd = ubidi_getLevelAt(pTransform->pBidi, i) & 1;
U16_NEXT(pTransform->src, i, pTransform->srcLength, c);
U16_NEXT(pTransform->src, i, pTransform->srcLength, c);
U16_APPEND_UNSAFE(pTransform->dest, j, isOdd ? u_charMirror(c) : c);
} while (i < pTransform->srcLength);

*pTransform->pDestLength = pTransform->srcLength;
pTransform->reorderingOptions = UBIDI_REORDER_DEFAULT;
return true;
Expand Down Expand Up @@ -416,7 +416,7 @@ resolveBaseDirection(const char16_t *text, uint32_t length,
/**
* Finds a valid <code>ReorderingScheme</code> matching the
* caller-defined scheme.
*
*
* @return A valid <code>ReorderingScheme</code> object or nullptr
*/
static const ReorderingScheme*
Expand Down Expand Up @@ -498,8 +498,8 @@ ubiditransform_transform(UBiDiTransform *pBiDiTransform,

/* Checking for U_SUCCESS() within the loop to bail out on first failure. */
for (action = pBiDiTransform->pActiveScheme->actions; *action && U_SUCCESS(*pErrorCode); action++) {
if ((*action)(pBiDiTransform, pErrorCode)) {
if (action + 1) {
if (action && (*action)(pBiDiTransform, pErrorCode)) {
if (*(action + 1)) {
updateSrc(pBiDiTransform, pBiDiTransform->dest, *pBiDiTransform->pDestLength,
*pBiDiTransform->pDestLength, pErrorCode);
}
Expand Down
Loading

0 comments on commit f81beab

Please sign in to comment.