Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with using WyHash with std::hash::Hash #9

Merged
merged 6 commits into from
Apr 16, 2024

Conversation

Lymia
Copy link
Contributor

@Lymia Lymia commented Apr 16, 2024

While WyHash works properly in the use case where a single call to write is followed by a call to finish, other use cases (such as those that would be caused by a HashMap<(u32, u32), u32, RandomWyHashState>) often cause some of the input data to be entirely ignored.

This modifies the Hasher implementation of WyHash such that it mixes in the hi and lo from previous calls to write and related functions using wymix before running it again. This keeps the property that WyHash generates the same output as the reference implementation when used with a single write call while making it both faster and more correct for use in a Rust HashMap.

Intended guarantees about the hash output:

/// # Stability
///
/// The result is only guaranteed to match the result `wyhash` would naturally produce if `write`
/// is called a single time, followed by a call to `finish`.
///
/// Any other sequence of events (including calls to `write_u32` or similar functions) are
/// guaranteed to have consistent results between platforms and versions of this crate, but may not
/// map well to the reference implementation.

@Bluefinger Bluefinger added bug Something isn't working enhancement New feature or request labels Apr 16, 2024
}

while index > 16 {
if length <= 0 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit, but since len() returns a usize, there won't be anything smaller than a 0 here, so can just == 0. Not too fussed though about this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Tbh, mostly had that just because all the clauses were length <=.

@Bluefinger
Copy link
Owner

Many thanks for the PR, I'll look into this a bit more and probably merge this since correctness is always appreciated (with perf improvements as a plus). The change in output/stability guarantees will necessitate a new major version though, but that's fine by me.

@Lymia
Copy link
Contributor Author

Lymia commented Apr 16, 2024

From what I gather, it should only really be a few instructions faster for the use case of HashMap<u32, ...> after optimization. (I should probably write a benchmark to make sure of it.)

The other optimization opportunity I saw was storing the secret in RandomWyHashState and passing it directly to the WyHash rather than using WyHash::new as that avoids calling make_secret each time (when fully_randomised_wyhash is enabled). But that seems like it'd be better done as a separate PR.

@Bluefinger
Copy link
Owner

From what I gather, it should only really be a few instructions faster for the use case of HashMap<u32, ...> after optimization. (I should probably write a benchmark to make sure of it.)

The other optimization opportunity I saw was storing the secret in RandomWyHashState and passing it directly to the WyHash rather than using WyHash::new as that avoids calling make_secret each time (when fully_randomised_wyhash is enabled). But that seems like it'd be better done as a separate PR.

Yup, make_secret is quite an expensive operation, and thinking about it, while the seed can be randomised with each new WyHash, the secret probably should be generated once per runtime. Probably put the secret behind a OnceLock, though that might require a MSRV bump too.

Copy link
Owner

@Bluefinger Bluefinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, documented and tested so no blockers on this one.

@Bluefinger Bluefinger merged commit e436ed7 into Bluefinger:main Apr 16, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants