Skip to content

Commit

Permalink
clarify ratio
Browse files Browse the repository at this point in the history
  • Loading branch information
samthor committed Apr 28, 2020
1 parent 336416a commit 935b83e
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ Usage:
If you don't provide a source file, or specify a length instead, this will generate actual random text in JavaScript.

For a better test, use suggested UTF-8 encoded source text from [Project Gutenberg](https://www.gutenberg.org/files/23841/23841-0.txt).
This has a ratio of "bytes-to-length" of 0.35.
The linked file has a ratio of "bytes-to-length" of 0.35.

This is an odd number, but we're comparing the on-disk UTF-8 bytes (which optimize for ASCII and other low Unicode values) to the length of JavaScript's UCS-2 / UTF-16 internal representation.
This ratio is an odd number.
It compares the on-disk UTF-8 bytes (which optimize for ASCII and other low Unicode values) to the length of JavaScript's UCS-2 / UTF-16 internal representation.
All Unicode code points can be represented as either one or two "lengths" of a JavaScript string, but each code point can be between 1-4 bytes in UTF-8.
The possible ratios therefore range from 0.25 (e.g., all emoji) through 1.0 (e.g., ASCII).
The valid ratios therefore range from through 1.0 (e.g., ASCII).

# Options

Expand Down

0 comments on commit 935b83e

Please sign in to comment.