Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value.Hash doesn't work with Unicode strings #795

Closed
matlin opened this issue Mar 19, 2024 · 3 comments
Closed

Value.Hash doesn't work with Unicode strings #795

matlin opened this issue Mar 19, 2024 · 3 comments

Comments

@matlin
Copy link

matlin commented Mar 19, 2024

It appears that Hash assumes that each character of a String is a single ascii byte while unicode characters (e.g. emiojis) can be larger.

Value.Hash('💣') // This will break
Screenshot 2024-03-19 at 9 31 24 AM
function String(value) {
        FNV1A64(ByteMarker.String);
        for (let i = 0; i < value.length; i++) {
            FNV1A64(value.charCodeAt(i)); // This can be larger than a byte
        }
    }
@sinclairzx81
Copy link
Owner

@matlin Hi, Thanks for reporting.

This issue has been resolved in more recent versions of TypeBox. The current implementation as of 0.32.15 (current) is specified as:

Hash-NumberToBytes: https://github.com/sinclairzx81/typebox/blob/master/src/value/hash/hash.ts#L68
Hash-StringType: https://github.com/sinclairzx81/typebox/blob/master/src/value/hash/hash.ts#L115

function* NumberToBytes(value: number): IterableIterator<number> {
  const byteCount = value === 0 ? 1 : Math.ceil(Math.floor(Math.log2(value) + 1) / 8)
  for (let i = 0; i < byteCount; i++) {
    yield (value >> (8 * (byteCount - 1 - i))) & 0xff
  }
}
// ...
function StringType(value: string) {
  FNV1A64(ByteMarker.String)
  for (let i = 0; i < value.length; i++) {
    for (const byte of NumberToBytes(value.charCodeAt(i))) {
      FNV1A64(byte)
    }
  }
}

Looking back, this issue appears to have been resolved on 0.31.15 (#594) as the nearest version. If you upgrade to at least 0.31.15, things should be fixed.

Hope this helps!
S

@sinclairzx81
Copy link
Owner

Will close off as this issue as Unicode support for Hash has been implemented.

Cheers!

@matlin
Copy link
Author

matlin commented Mar 19, 2024

Thanks! Apologies - thought we were on the latest version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants