Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: typo tolerance #2144

Merged
merged 10 commits into from
Aug 28, 2024
Merged

feature: typo tolerance #2144

merged 10 commits into from
Aug 28, 2024

Conversation

aaryanpunia
Copy link
Contributor

@aaryanpunia aaryanpunia commented Aug 8, 2024

When spinning up services the amount of word_workers > bktree_workers. If dealing with big datasets spin up word_workers to 15+ to accommodate load CH can take the abuse

@skeptrunedev skeptrunedev changed the title Aaryanpunia/spellcheck impl wip: spellcheck impl Aug 8, 2024
@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 2 times, most recently from b80abcf to 9df46e5 Compare August 9, 2024 05:14
@densumesh densumesh marked this pull request as ready for review August 9, 2024 05:14
@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 3 times, most recently from 1f6e97d to 48d3ce7 Compare August 9, 2024 05:27
@densumesh densumesh changed the title wip: spellcheck impl feature: typo tolerance Aug 9, 2024
cdxker
cdxker previously requested changes Aug 9, 2024
Copy link
Member

@cdxker cdxker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change id-worker to a better name and include cronjob instead of worker

@aaryanpunia
Copy link
Contributor Author

Pending problems -

  1. In word-worker, somehow, when inserting in the relation table, we are inserting the same (dataset_id, word_id) tuple multiple times in a single call to Postgres, this is not allowed when utilizing an on conflict do update.
  2. Deadlocks with multiple word-workers, solvable by utilizing clickhouse instead of Postgres.

@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 2 times, most recently from bf40e24 to 1470220 Compare August 12, 2024 21:35
@densumesh densumesh requested a review from cdxker August 12, 2024 22:53
@cdxker cdxker force-pushed the aaryanpunia/spellcheck-impl branch from 2816711 to 1814ad8 Compare August 13, 2024 19:26
@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 7 times, most recently from e99c37d to ec2a434 Compare August 16, 2024 01:39
@cdxker cdxker force-pushed the aaryanpunia/spellcheck-impl branch from a9c2721 to 3d0763b Compare August 20, 2024 18:04
Copy link
Member

@cdxker cdxker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive data structure is too deep, getting "bk tree depth limit exceeded" serde errror when trying to load into the server. Will need to store the tree as a different structure. First thing that comes to mind is an adjacency list.

@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 4 times, most recently from b3d4926 to 52d97ed Compare August 26, 2024 05:24
@densumesh densumesh force-pushed the aaryanpunia/spellcheck-impl branch 2 times, most recently from 870bd4f to fcc95cc Compare August 26, 2024 05:32
Copy link
Member

@cdxker cdxker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it return the corrected query on the response?

@densumesh densumesh merged commit ed22f21 into main Aug 28, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants