diff --git a/index.html b/index.html index 5b06603..dd9e8aa 100644 --- a/index.html +++ b/index.html @@ -2969,7 +2969,7 @@

Text truncation in UTF-8

-

Specifications that limit the length of a string SHOULD require truncation on grapheme boundaries, as truncation in the midst of a combining or joining sequence can alter the meaning of the string.

+

Specifications that limit the length of a string SHOULD require truncation on grapheme boundaries, as truncation in the midst of a grapheme or combining character sequence can alter the meaning of the string.

@@ -2977,8 +2977,14 @@

Text truncation in UTF-8

-

When specifying a length limitation in code units (such as bytes), specifications SHOULD set the maximum length in a way that accommodates users whose language requires multibyte code unit sequences.

+

When specifying a length limitation in code units (such as bytes), specifications SHOULD set the limit in a way that accommodates users whose language requires multibyte code unit sequences.

+ +
+

If a specification specifies a length limit in code units (such as bytes), it MUST specify the character encoding used in measuring the limit; such a limit SHOULD NOT specify a legacy character encoding.

+
+ +

If a specification permits or requires truncation of a field, the character encoding is important in knowing what the limit means. If the limit is in bytes and legacy character encodings are permitted, note that conversion of Unicode data to a non-Unicode encoding can also result in data loss (since most legacy character encodings encode only a subset of Unicode).