From db2e629cf0acddf1f3affacf03041c716b2eb824 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 15 Apr 2024 11:59:35 +0200 Subject: [PATCH] Define isomorphic string And define it alongside ASCII and scalar value strings. --- infra.bs | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/infra.bs b/infra.bs index 14ab8a5..bb72bfd 100644 --- a/infra.bs +++ b/infra.bs @@ -929,13 +929,25 @@ leaving them effectively as-is. would consist of the code points U+1F4A9 and U+D800.

A string's -length +length is the number of code units it contains.

A string's -code point length is the number +code point length is the number of code points it contains. +


+ +

To signify strings with additional restrictions on the code points they can contain +this specification defines ASCII strings, isomorphic strings, and +scalar value strings. Using these improves clarity in specifications. + +

An ASCII string is a string whose code points are all +ASCII code points. + +

An isomorphic string is a string whose code points are all in the +range U+0000 NULL to U+00FF (ÿ), inclusive. +

A scalar value string is a string whose code points are all scalar values. @@ -943,6 +955,8 @@ of code points it contains. where UTF-8 encode comes into play. +


+

To convert a string into a scalar value string, replace any surrogates with U+FFFD (�). @@ -1172,7 +1186,7 @@ from start to the end of a string string is the

To isomorphic encode a string input, run these steps:

    -
  1. Assert: input contains no code points greater than U+00FF. +

  2. Assert: input contains no code points greater than U+00FF (ÿ).

  3. Return a byte sequence whose length is equal to input's code point length and whose bytes have the same @@ -1182,9 +1196,6 @@ from start to the end of a string string is the


    -

    An ASCII string is a string whose code points are all -ASCII code points. -

    To ASCII lowercase a string, replace all ASCII upper alphas in the string with their corresponding code point in ASCII lower alpha.