PyPruningRadixTrie

Python Port of Pruning Radix Trie by Wolf Garbe.

Changes compared to original version

Removed parameter to disable pruning behavior.
- See test/non_pruning_radix_trie.py for a non-pruning version that you can use to see the speed improvement.
Added outline for more generic InputProvider and providers that read CSV or JSON as examples

What and Why

A Trie is a tree data structure that is commonly used for searching terms that start with a given prefix.
It starts with an empty string at the base of the trie, the root node.
Adding a new entry to the trie creates a new branch. This branch shares already present characters with existing nodes and creates new nodes when it's prefix diverges from the present entries.

# trie containing flower & flowchart (1 char = 1 node)

'' - f - l - o - w - e - r
                 |
                 c - h - a - r - t

A RadixTrie is the space optimized version of a Trie.
It combines the nodes with only one sub-node into one, containing more than one character.

# radix trie containing flower & flowchart

'' - flow - er
      |
     chart

The prefix Pruning references the algorithm to query the RadixTrie.
In order for the pruning to work, the nodes in RadixTrie are stored in a sorted manner.
This structure allows cutting off unpromising branches during querying the trie which makes the algorithm way faster compared to a non-pruning trie.

Usage

Get from PyPI:

pip install pypruningradixtrie

Create the PRT:

# empty trie
trie = PruningRadixTrie()

# fill with data from CSV file on creation
trie = PruningRadixTrie('./test_data.csv', CSVInputProvider(',', lambda x: float(x[1])))

Add entries:
CSV:

# fill with data from CSV file, score is at position 1, term at position 0
fill_trie_from_file(trie, './test_data.csv', CSVInputProvider(',', lambda x: float(x[1]), 0))

JSON:

# define a functon to calculate the score out of a JSON entry
def score_fun(json_entry: Dict[str, Any]) -> float:
  return json_entry["pages"] * json_entry["year"] / 10.0

# "title" = key for term to insert into PRT
fill_trie_from_file(trie, './test_data.json', JSONInputProvider("title", score_fun))

Single Entry:

# insert single entry
insert_term(trie, term="flower", score=20)

Use the PRT:

# get the top 10 entries that start with 'flower'
trie.get_top_k_for_prefix('flower', 10)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
pypruningradixtrie		pypruningradixtrie
test		test
.gitignore		.gitignore
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

PyPruningRadixTrie

What and Why

Usage

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

otto-de/PyPruningRadixTrie

Folders and files

Latest commit

History

Repository files navigation

PyPruningRadixTrie

What and Why

Usage

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages