Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(don't merge) Re-add siphash #222

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jappeace
Copy link

@jappeace jappeace commented Sep 29, 2021

This is a draft of getting siphash back into hashable.

This is the commit that removed it: d65f9de

I intend to benchmark this implementation, and also https://github.com/google/highwayhash#siphash
These things are left to be done:

  • Investigate why the tests are failing.
    • compression didn't do anything (some regressions where added for this).
    • chunking fails again after fixing compression.
  • cleanup api, we can remove siphash and rexpress in new chuncked primitives. then the siphash api is only 3 functions (instead of 8 or so before).
  • Add highway hash
  • At somepoint check if it builds on osx/windows.
  • How to make k0 and k1 hide-able by the client? (the entire point is that these should be kept a secret).
  • always assume sse on x86 and enable sse flag for sse4 by default.

See related issue: haskell-unordered-containers/unordered-containers#319

Data/Hashable/Class.hs Outdated Show resolved Hide resolved
cbits/siphash.h Outdated
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t u64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use these typedefs, they barely save any typing.

src/Data/Hashable/Class.hs Outdated Show resolved Hide resolved
hashable.cabal Outdated
@@ -61,6 +61,15 @@ flag random-initial-seed
manual: True
default: False


flag sse2
description: Do we want to assume that a target supports SSE 2?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SSE2 instruction set. Available on virtually any x86 CPU made since 2005. i.e. don't ask, tell.

For sse4 write something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the latest text version assumes that SSE2 is always available on x64, see e.g. https://hackage.haskell.org/package/text-1.2.5.0/src/cbits/cbits.c

#if defined(__x86_64__)
  /* All the intrinsics used here are from SSE2,
   * so every x86_64 CPU supports them.
   */

I actually don't care if i386 is slow. I don't have machines to test it anyway, so being simple on these is a virtue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I take from this that sse2 can be enabled without flag, and sse4 probably can be enabled by default at this point in time.

(I still need to do this btw).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sse2 on x64. On i386 i'd rather not have conditional branches at all, as they are hard to test nowadays.

src/Data/Hashable/Class.hs Outdated Show resolved Hide resolved
@phadej
Copy link
Contributor

phadej commented Sep 30, 2021

You need to include includes in extra-source-files for build to proceed.

@phadej
Copy link
Contributor

phadej commented Sep 30, 2021

I introduced LowLevel module in #225. I'd expect all haskell code changes to be contained in it.

@jappeace
Copy link
Author

jappeace commented Sep 30, 2021

sorry Oleg I didn't already expect a review, I'll address all your comments.

cbits/siphash.c Outdated
return buffered + (len & 7);
}

b |= ((uint64_t) totallen) << 56;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a variant of siphash which doesn't force to include total length? This makes it uncomposable. Note how FNV-1 is hashing a stream, we just feed it data. (And also md5,sha1.sha256 etc.)

To repeat my previous comment, rebase on current master and try to contain the changes inside LowLevel module. It's not good that you need to reimplement how lazy test and bytestring hashable instances, you shouldn't. Int, bytearray/ptr buffer hash primitives should be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eliminated this totallen, I think this is NIH code. (can't find it in the paper)

It's still not easily compose-able because siphash has an internal state of 4 64 bit words.
These need to be initialized with k0 and k1, but after this step it sortoff becomes like fnv if we keep track of that internal state separatly and you can fold over it (as long as you keep track of the state).
I still need to test this out against that rechunck property test.

I think it's possible to express everything in terms of bytearray#, although I'm not sure how to get around the initilization/finilization steps.

src/Data/Hashable/LowLevel.hs Outdated Show resolved Hide resolved
src/Data/Hashable/Class.hs Outdated Show resolved Hide resolved
This reverts commit d65f9de.

Re-add fnv

Make it sortoff work with siphash

except linking

Make it link

Remove typedef

Add siphash.h to includes folder

Fix golden test for siphash

If we can't make siphash as fast as fnv,
we probably want to support both in a final version.
But I'll do that later.

Fix tests for chuncks in bytestring

(I just forgot to port a bunch of code)

Fix lazy text test suite as well

Delete example file

Rebase on master, make test suite pass once more
this is just the old implementation of siphash.
with some tweaks to make it more streamable
(still not great but good enough for benchmarks).

-- squash log

Use capi if available for offset

Add more capi calls

Flip expectation and actual args

it showed up wrongly in the test suite

Add more capi, add TODO

Revert Hashable/Class to upstream master

trying to contain changes to just that module

Fix regression (once more)

Don't expose the old functions

Fix capi issues.

Apprantly half of these functions weren't defined in the header.

Tell people to use sse instead of asking

Apply clang format

Add siphash to benchmarks

Add initializeState function, this will capture initizlie

don't know how to do finalize yet

Make it compile again (apparntly k1 and salt were used interchanbly)

that's not a good idea

Clean out the c code a bit.

It was really strange.

Make it build.. again?

re-add header file

I guess this will just keep on bein inconsistent

remove cbits *.h from benchamark cabal file

OMG that worked, lazy text now bytearray

man, I doubted myself every step along the way, maybe I can do this?

Draft for doing bytestring

Make some headway on the regession tests

Make testsuite pass once more

Put in rounds as an argument

Add compression round as args

Add some common sense tests on previous state

fails of course, not sure what's happening here

Import <$>

I guess different chars can also produce same hashes.

Cleanup C-api for a bit

it's still utterly broken however.
I guess I need to try it out in C and see why the state isn't
changing.

Chunck -> Chunk

Get rid of more c code

idk maybe this do something with the sse stuff

(after reading ghci comment)

Figured out why no compression (lol)

Update 64bit text once more

Better explain 0

Split lines

Lazy text can be big as well

don't funroll by hand, add comments explaing what's going on

Use properties instead of hardcoded test cases for statefullness

Add prefix based tests as well

hmm not so ez for text

this also makes me doubt the bytestring implementaiton.

I should check if with larger arbitrary instances that doesn't
fall over

(text tends to have larger bytestring representations then bytestring)

Guess we really did solve the bytestring implementation

This should pass CI

Fix imports

Fix ci for ghc 8.2.2

Add icescream cone

Try fix ancient ghc

Attempt fix ancient ghc II

expose bytestring dependency ancient ghc

import pure as well

diddn't realize my style was so modern

fix parse error

maybe this will make conditional true?

Fix properties for ancient ghc

Fix parse error on imports

add bytestring builder everywhere

maybe solve cabal pain like this?

disable property for ancient ghc

try fix ancient ghc IX
@phadej
Copy link
Contributor

phadej commented Oct 6, 2021

I'm unsubscribing from this PR. Ping me when there's something to see again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants