Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly improve dnode hash #16131

Merged
merged 1 commit into from
May 1, 2024
Merged

Slightly improve dnode hash #16131

merged 1 commit into from
May 1, 2024

Conversation

amotin
Copy link
Member

@amotin amotin commented Apr 24, 2024

As I understand just for being less predictable dnode hash includes 8 bits of objset pointer, starting at 6. But since objset_t is more than 1KB in size, its allocations are likely aligned to 2KB, that means 11 lower bits provide no entropy. Just take the 8 bits starting from 11.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
@amotin
Copy link
Member Author

amotin commented Apr 24, 2024

I wonder if cityhash4() could be better (faster or cleaner) here.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Apr 24, 2024
@Harry-Chen
Copy link
Contributor

Harry-Chen commented Apr 25, 2024

I wonder if cityhash4() could be better (faster or cleaner) here.

I did some quick test: https://godbolt.org/z/o6j4dPEvK. On most platforms, with -O2 level optimization, cityhash64 generates fewer instructions than CRC64 and eliminates all memory access.

P.S. cityhash64 is excerpted from https://github.com/google/cityhash/blob/master/src/city.cc

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 1, 2024
@behlendorf behlendorf merged commit 8fd3a5d into openzfs:master May 1, 2024
23 of 26 checks passed
@amotin amotin deleted the dnode_hash branch May 1, 2024 17:59
amotin added a commit to amotin/zfs that referenced this pull request May 23, 2024
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16131
ixhamza pushed a commit to truenas/zfs that referenced this pull request May 23, 2024
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16131
ixhamza pushed a commit to truenas/zfs that referenced this pull request May 23, 2024
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16131
behlendorf pushed a commit that referenced this pull request May 29, 2024
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #16131
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Aug 27, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Aug 27, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Aug 27, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16131
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Sep 7, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Sep 10, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
Harry-Chen added a commit to Harry-Chen/zfs that referenced this pull request Sep 19, 2024
As mentioned in PR openzfs#16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs#16131

Signed-off-by: Shengqi Chen <[email protected]>
behlendorf pushed a commit that referenced this pull request Sep 19, 2024
As mentioned in PR #16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Shengqi Chen <[email protected]>
Closes #16131
Closes #16483
behlendorf pushed a commit that referenced this pull request Sep 19, 2024
Specializing cityhash4 on 32-bit architectures can reduce the size
of stack frames as well as instruction count. This is a tiny but
useful optimization, since some callers invoke it frequently.

When specializing into 1/2/3/4-arg versions, the stack usage
(in bytes) on some 32-bit arches are listed as follows:

- x86: 32, 32, 32, 40
- arm-v7a: 20, 20, 28, 36
- riscv: 0, 0, 0, 16
- power: 16, 16, 16, 32
- mipsel: 8, 8, 8, 24

And each actual argument (even if passing 0) contributes evenly
to the number of multiplication instructions generated:

- x86: 9, 12, 15 ,18
- arm-v7a: 6, 8, 10, 12
- riscv / power: 12, 18, 20, 24
- mipsel: 9, 12, 15, 19

On 64-bit architectures, the tendencies are similar. But both stack
sizes and instruction counts are significantly smaller thus negligible.

Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Shengqi Chen <[email protected]>
Closes #16131
Closes #16483
behlendorf pushed a commit that referenced this pull request Sep 19, 2024
So that we can get actual benefit from last commit.

Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Shengqi Chen <[email protected]>
Closes #16131
Closes #16483
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants