Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Improvements #5

Open
brandonsturgeon opened this issue Feb 20, 2024 · 1 comment
Open

Search Improvements #5

brandonsturgeon opened this issue Feb 20, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@brandonsturgeon
Copy link
Member

brandonsturgeon commented Feb 20, 2024

Currently, we break each page body down into keywords and then perform a keyword lookup at runtime.

This is space/memory efficient, but it's not very robust. For example, ACT will return the correct results, but ACT_ will return nothing.

I need some help with this!

I don't know how to generate a single structure of search terms that I can easily query later.
The product file can't be too big because we have to pull it into memory every time we search.

We should stick to using non-cloudflare solutions so we can maintain compatibility with self-hosting.

This is the entrypoint for the SearchManager, which generates the final JSON blob we use to perform searches. Each scraped page's inner-html content (that is, the stuff that changes as you navigate each page) is passed into this function:
https://github.com/CFC-Servers/gmodwiki/blob/main/build/modules/search.ts#L100-L124

My general goal was to strip any characters out that would cause search conflicts or significantly increase the size of the search blob before generating the reverse lookup of terms -> page IDs (and search context).

The main challenge for searching on Gmodwiki is that we need to pre-compute the search terms. We don't have a database of page entries that we can query at search-time, and we can't reasonably shove the full wiki content into a json object.

@brandonsturgeon brandonsturgeon added the enhancement New feature or request label Feb 20, 2024
@brandonsturgeon brandonsturgeon self-assigned this Feb 20, 2024
@brandonsturgeon brandonsturgeon added the help wanted Extra attention is needed label Jun 2, 2024
@A1steaksa
Copy link

To my mind, a pre-existing solution like https://sphinxsearch.com/ would be the most sane way of handling it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants