Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses of PreHashedMap are probably not sound #6373

Open
jimblandy opened this issue Oct 6, 2024 · 4 comments
Open

Uses of PreHashedMap are probably not sound #6373

jimblandy opened this issue Oct 6, 2024 · 4 comments

Comments

@jimblandy
Copy link
Member

The PreHashedMap type in wgpu_core is probably not used in a sound way, in particular, its use in Device::derive_pipeline_layout is almost certainly incorrect.

The idea is that a PreHashedMap key is just a copy of the true key's u64 hash value. But this doesn't provide any way to distinguish keys that have the same hash value: PartialEq on a PreHashedKey just compares the cached hash values. This will lead to incorrect lookups and insertions.

(I don't think PreHashedMap makes much sense: it's only sound if you can guarantee a distinct hash value for distinct keys, which means these "hash values" are almost certainly not produced by hashing: you're using some identifier or pointer that was assigned in a way that guarantees uniqueness. In that case, you should simply use that identifier as the key directly.)

@jimblandy
Copy link
Member Author

Aside: Managing pools of de-duplicated complex values is really tricky, even if you're not playing games with hash values: my friend recently found a year-old race condition in similar code he'd written for wasmtime that led to a CVE.

@jimblandy
Copy link
Member Author

From chat, @cwfitzgerald says:

We can probably mitigate this by storing the key and the hash, then dispatching eq to the real key, and hash to the precomputed hash

I infer from this that the motivation was to avoid the overhead of re-computing hash values. If we're doing a lot of queries with the same key, and we can keep the PreHashedKey around to reuse for all the queries, then this makes sense.

But it's my understanding that Rust hash tables themselves cache the hash values of the keys already stored in the table. A lookup computes the hash of the key you're querying with, but then it just checks that against the hashes stored in the table, and performs a real PartialEq comparison only when the hashes match. So storing PreHashedKey in the table doesn't make sense; the table already has the hash value cached.

@jimblandy
Copy link
Member Author

Hmm, I'm reluctant to dive into the hashbrown code, but from the description it seems like they only store one byte of the hash value for each key. But I believe it is the case that they are not re-hashing stored keys at lookup time, which means that storing PreHashedKey in the table isn't avoiding hashes.

@cwfitzgerald
Copy link
Member

Yeah I dove into it a bit too, and yeah I think we should just pull the whole abstraction - I was trying to avoid needing to do large structure comparisons inside a lock, but I don't think that can be reasonably avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants