Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Trie data structure #1871

Merged
merged 39 commits into from
Jan 14, 2024
Merged

Add Trie data structure #1871

merged 39 commits into from
Jan 14, 2024

Conversation

SandroMaglione
Copy link
Contributor

@SandroMaglione SandroMaglione commented Jan 8, 2024

Type

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

Added Trie data structure.

A Trie can be used to to efficiently store and locate keys within a set.

  • Search hits take time proportional to the length of the search key
  • Search misses involve examining only a few characters

This PR adds a Trie module implemented as a Ternary Search Tree. A Ternary Search Tree is more space-efficient and allows to use an arbitrary string as key (instead of a limited alphabet).

Questions

  • While still being a Trie, the new module is technically a Ternary Search Trie (TST). Is the name Trie okay?
  • Are there some guidelines to provide benchmarks for a new data structure?

TODO

  • More documentation and examples
  • Add some more common methods
    • has
    • insertMany
    • removeMany
    • map
    • forEach
    • reduce
    • isEmpty
    • filterMap
    • filter
    • compact
    • modify

Copy link

changeset-bot bot commented Jan 8, 2024

🦋 Changeset detected

Latest commit: fe31668

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
effect Patch
@effect/cli Patch
@effect/experimental Patch
@effect/opentelemetry Patch
@effect/platform-browser Patch
@effect/platform-bun Patch
@effect/platform-node Patch
@effect/platform Patch
@effect/printer-ansi Patch
@effect/printer Patch
@effect/rpc-http-node Patch
@effect/rpc-http Patch
@effect/rpc-nextjs Patch
@effect/rpc-workers Patch
@effect/rpc Patch
@effect/schema Patch
@effect/typeclass Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@mikearnaldi
Copy link
Member

A few questions:

  • why is key fixed to be string?
  • how does this compare to HashMap?

@SandroMaglione
Copy link
Contributor Author

why is key fixed to be string?

Most of the main usecases of Tries have to do with strings, for example for full-text search, autocomplete (predictive text), spell checking. Some of these usecases take advantage of prefix search (keysWithPrefix).

Prefix search is also the main feature that makes a Trie more suited than an hash table for certain usecases.

That being said, I think that technically there should be no issue in making the key also a generic K. Let me know if we want to support this.


how does this compare to HashMap?

Trie are better suited for autocomplete, text search, spell checking, and generally when working with string and prefixes:

  • A Trie may use less space since it stores common prefixes in the same nodes (If there are many common prefixes, the space they require is shared). This is the case for dictionaries of words, which makes a Trie ideal for storing and retrieving words from a large list (i.e. searching a list of words inside a text)
  • The structure of a Trie allows for quick look up of prefixes of keys and enumerate all entries with a given prefix (longestPrefixOf and keysWithPrefix)
  • A Trie has O(n) lookup time where n is the size of the key, or even less than n time on search misses (predictable). An hash table requires the full string to be scanned to compute the hash (less predictable). In the best case is O(1), but this is data dependent.

@mikearnaldi
Copy link
Member

why is key fixed to be string?

Most of the main usecases of Tries have to do with strings, for example for full-text search, autocomplete (predictive text), spell checking. Some of these usecases take advantage of prefix search (keysWithPrefix).

Prefix search is also the main feature that makes a Trie more suited than an hash table for certain usecases.

That being said, I think that technically there should be no issue in making the key also a generic K. Let me know if we want to support this.

how does this compare to HashMap?

Trie are better suited for autocomplete, text search, spell checking, and generally when working with string and prefixes:

  • A Trie may use less space since it stores common prefixes in the same nodes (If there are many common prefixes, the space they require is shared). This is the case for dictionaries of words, which makes a Trie ideal for storing and retrieving words from a large list (i.e. searching a list of words inside a text)
  • The structure of a Trie allows for quick look up of prefixes of keys and enumerate all entries with a given prefix (longestPrefixOf and keysWithPrefix)
  • A Trie has O(n) lookup time where n is the size of the key, or even less than n time on search misses (predictable). An hash table requires the full string to be scanned to compute the hash (less predictable). In the best case is O(1), but this is data dependent.

Thanks, would such L be fully generic or a subtype of string, if the latter (which I assume given prefix definition, etc) then this is already OK

packages/effect/src/Trie.ts Outdated Show resolved Hide resolved
packages/effect/src/Trie.ts Show resolved Hide resolved
packages/effect/src/internal/trie.ts Outdated Show resolved Hide resolved
@SandroMaglione
Copy link
Contributor Author

@mikearnaldi I completed the API implementation. I am now going to update and expand the docs and then the PR is ready.

packages/effect/src/Trie.ts Outdated Show resolved Hide resolved
packages/effect/src/Trie.ts Outdated Show resolved Hide resolved
packages/effect/src/Trie.ts Show resolved Hide resolved
packages/effect/src/internal/trie/node.ts Outdated Show resolved Hide resolved
packages/effect/src/internal/trie.ts Outdated Show resolved Hide resolved
@SandroMaglione
Copy link
Contributor Author

SandroMaglione commented Jan 10, 2024

@fubhy @mikearnaldi I added all the documentation and completed the implementation. The PR is ready for a complete review and merging.

Copy link
Member

@fubhy fubhy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'll have a think about what we can do to make the codegen step for the barrel files a bit ... less easily skippable.

EDIT:

@SandroMaglione I fixed the examples. Some were missing imports, others had subtle bugs. Please double check!

I'll also have a think about how we can make that more pleasant for contributors ;-)

@SandroMaglione
Copy link
Contributor Author

@fubhy LGTM as well 👍

@fubhy
Copy link
Member

fubhy commented Jan 11, 2024

@SandroMaglione Please also check out 338e1a7

I would propose to simplify this to plain object literlas. That way we also don't have to initialize this internal node representation with a bunch of undefined properties in some places.

I think we should also patch this in RedBlackTree (the Node value bag).

The absolute performance gain is negligible but in relative terms it's a lot and that makes a difference imho. And since there's no DX downgrade by doing this (imho) we should do it.

@SandroMaglione
Copy link
Contributor Author

I think we should also patch this in RedBlackTree (the Node value bag)

I think this should be part of another PR. I also opened another issue related to RedBlackTree, we could refactor RedBlackTree from there

@fubhy
Copy link
Member

fubhy commented Jan 11, 2024

I think we should also patch this in RedBlackTree (the Node value bag)

I think this should be part of another PR. I also opened another issue related to RedBlackTree, we could refactor RedBlackTree from there

Oh for sure. That's what I meant :-)

@mikearnaldi mikearnaldi merged commit 540b294 into Effect-TS:main Jan 14, 2024
9 checks passed
@github-actions github-actions bot mentioned this pull request Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants