Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associative Map Example #343

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Dec 12, 2020

This is pretty neat I think
Closes #332
It has a List backed dictionary and a hash based dictionary

TODO

  • Put under common interface (i need to workout why it is giving so many errors)
  • Use new interface/instance stuff, not @instance
  • Implement a fast hashing algorithm
  • Split Hashable into seperate type classes for isequal and hash make Hashable use Eq, and don't make Float hashable

Maybe TODO

  • Deleting
  • Overwriting (right now it basically leaks memory if you overwrite anything)

Not TODO:

  • Rebucketting when too full (that seems like it would complicate things a bit too much for this PR)

@google-cla
Copy link

google-cla bot commented Dec 12, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@oxinabox
Copy link
Contributor Author

And i made the bot sad again because i committed based on top of the wrong branch. 😂

@oxinabox
Copy link
Contributor Author

oxinabox commented Dec 13, 2020

Benchmarks

  • Dex listdict: 6us

  • Dex hashdict: 9us

  • Julia hashdict: 0.07 us

  • Julia littledict: 0.33 us

  • Dex hash: 8us, that's the bottleneck

  • Julia hash: 0.06 us

Dex

full_data = for i:(Fin 256). ([10 * ordinal i, ordinal i], i)
big_listdict = fold (emptyListDict _ _)  \i state. storeListDict state (fst full_data.i) (snd full_data.i)
big_hashdict = fold (emptyHashDict 128 _ _) \i state. storeHashDict state (fst full_data.i) (snd full_data.i)
%time
tryGetHashDict big_hashdict [1280, 128]
(Just (128@Fin 256))

Compile time: 150.174 ms
Run time:     9.000 us 
%time
thehash [1280, 128]
1720020324

Compile time: 74.541 ms
Run time:     8.000 us 
%time
tryGetListDict big_listdict [1280, 128]
(Just (128@Fin 256))

Compile time: 130.772 ms
Run time:     6.000 us 

Julia

The OrderedCollections.LittleDict is not quite the list dict implemented here, but it is close.
Its what I wanted to implement but couldn't get to work, 1 list of keys, one list of values.
(and I don't know what the Julia Dict uses for collisions, but i am sure it isn't a LittleDict

julia> using OrderedCollections, BenchmarkTools

julia> const full_data = [([10ii, ii]=>ii) for ii in 1:256];

julia> const littledict = LittleDict(full_data);

julia> const hashdict = Dict(full_data);

julia> @btime littledict[$[1280, 128]];
  330.862 ns (0 allocations: 0 bytes)

julia> @btime hashdict[$[1280, 128]];
  73.187 ns (0 allocations: 0 bytes)

julia> @btime hash($([1280, 128]))
  64.405 ns (0 allocations: 0 bytes)
0x71191e98f759495d

@dougalm
Copy link
Collaborator

dougalm commented Dec 14, 2020

If the hash is the performance bottleneck, should we use a cheaper one instead of the PRNG-grade threefry?

Copy link
Collaborator

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! BTW I wouldn't use %time for benchmarks, because it only executes the code once. When it only takes a few us or ns, you should use %bench, which will execute it a number of times to get a more accurate estimate.

Finally, there is one caveat which is that for some reason even the most trivial functions seem to take around 2us when ran in %bench, so it is unlikely that you will ever see values in the ranged of nanoseconds, even if that's the true cost. I suspect it's a problem with our benchmarking infrastructure, or it might be a cost of doing an FFI call from Haskell, but I didn't have time to investigate.

examples/hashmap.dx Outdated Show resolved Hide resolved
:p tryGetHashDict big_hashdict [210, 20]
:p tryGetListDict big_listdict [210, 20]

' ## Wrap under a common interface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like help working out what I am doing wrong in this section

@oxinabox
Copy link
Contributor Author

If the hash is the performance bottleneck, should we use a cheaper one instead of the PRNG-grade threefry?

A queck google suggest people commonly use the FNV hash algorithm which seems like it would be easy enough to implement for Int32

Copy link
Contributor Author

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@dan-zheng
Copy link
Collaborator

dev branch has been merged into main, now that plotting has been revamped. dev branch will be deleted now, we can continue development on main.

I'll change the base branch of this PR to main now!

@dan-zheng dan-zheng changed the base branch from dev to main December 18, 2020 22:59
@oxinabox
Copy link
Contributor Author

oxinabox commented Dec 19, 2020

FNV based hash doesn't seem much faster.
Had to say though since i can't use %bench (#359)
I suspect the time to do the FFI call is dominating perhaps

%time
threefryHash 0 1
-841280227

Compile time: 322.792 ms
Run time:     114.000 us 
%time
threefryHash 0 1
-841280227

Compile time: 864.773 ms
Run time:     8.000 us 
%time
threefryHash 0 1
-841280227

Compile time: 224.515 ms
Run time:     5.000 us 

vs

%time
fnvhash 0 1
-1717489731

Compile time: 36.299 ms
Run time:     9.000 us 
%time
fnvhash 0 1
-1717489731

Compile time: 22.220 ms
Run time:     6.000 us 
%time
fnvhash 0 1
-1717489731

Compile time: 22.274 ms
Run time:     5.000 us 

tryGet = tryGetHashDict


' all of the following error except the first
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is wrong, only the tryGet error.
but I don't know why they do

:p tryGet eg_listdict 'a'
Type error:Ambiguous type variables: [?2]

( ()
, ( ( tmp @> ( ((k:Type)
                ?-> (v:Type)
                ?-> (Associative (ListDict Word8 Int32) k v)
                ?=> (ListDict Word8 Int32) -> k -> Maybe v)
, (tryGet (ListDict Word8 Int32)) )
, tmp1 @> ( ((v:Type)
             ?-> (Associative (ListDict Word8 Int32) Word8 v)
             ?=> (ListDict Word8 Int32) -> Word8 -> Maybe v)
, (tmp Word8) )
, tmp2 @> ( ((Associative (ListDict Word8 Int32) Word8 ?2)
             ?=> (ListDict Word8 Int32) -> Word8 -> Maybe ?2)
, (tmp1 ?2) )
, tmp3 @> (((ListDict Word8 Int32) -> Word8 -> Maybe ?2), (tmp2 _))
, tmp4 @> ((Word8 -> Maybe ?2), (tmp3 eg_listdict))
, tmp5 @> ((Maybe ?2), (tmp4 'a'))
, tmp6 @> ((Maybe ?2), tmp5)
, _ans_ @> ((Maybe ?2), tmp6) )
, [ tmp:((k:Type)
         ?-> (v:Type)
         ?-> (Associative (ListDict Word8 Int32) k v)
         ?=> (ListDict Word8 Int32) -> k -> Maybe v) =
      tryGet (ListDict Word8 Int32)
  , tmp1:((v:Type)
          ?-> (Associative (ListDict Word8 Int32) Word8 v)
          ?=> (ListDict Word8 Int32) -> Word8 -> Maybe v) = tmp Word8
  , tmp2:((Associative (ListDict Word8 Int32) Word8 ?2)
          ?=> (ListDict Word8 Int32) -> Word8 -> Maybe ?2) = tmp1 ?2
  , tmp3:((ListDict Word8 Int32) -> Word8 -> Maybe ?2) = tmp2 _
  , tmp4:(Word8 -> Maybe ?2) = tmp3 eg_listdict
  , tmp5:(Maybe ?2) = tmp4 'a'
  , tmp6:(Maybe ?2) = tmp5
  , _ans_:(Maybe ?2) = tmp6 ] ) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that this is because Dex isn't aware that the dict parameter of Associative actually implies the values of k and v. Nothing prevents you from defining weird instances such as Associative (HashDict k v) Int Float, so it conservatively assumes that this code is ambiguous.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the same problem in Haskell is solved using the FunctionalDependencies extension, but we don't have it in Dex, and I'm not sure if we should. Our type class design is still a bit up in the air.

examples/hashmap.dx Outdated Show resolved Hide resolved
@oxinabox oxinabox marked this pull request as draft December 29, 2020 19:51
@srush srush mentioned this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Associative Map i.e. dictionary
5 participants