-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Opensearch as the tool for searching posts #296
base: main
Are you sure you want to change the base?
Conversation
// SE | ||
implementation("org.apache.httpcomponents.core5:httpcore5:5.2.4") | ||
implementation("org.apache.httpcomponents.core5:httpcore5-h2:5.2.4") | ||
implementation("org.opensearch.client:opensearch-java:2.10.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The httpcore
libraries provide HTTP request support for the Opensearch client, which makes requests over HTTP.
There are a range of Kotlin-specific Opensearch clients; I've not yet found one that wasn't a complete pig to set up or wasn't completely baffling to wrap my head around. I'm certain they exist, but didn't have any luck on a quick first pass.
Although the Java client is extremely verbose I've used it for this first version as it's very concise to form the initial connection to Opensearch, and constructing the query can be tidied up with helper methods if we want to go down that road.
configureInfraRouting() | ||
configurePostRouting() | ||
configureRequestHandling() | ||
configureUserInfoRouting() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The infra
routing is the only new thing here, I've just sorted the list to be alphabetically ordered.
import org.litote.kmongo.eq | ||
|
||
// TODO: Auth control | ||
fun Application.configureInfraRouting() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /infra
route is intended to provide simple setup tooling for an administrator to avoid us needing lots of helper scripts or ingrained knowledge on setting things up.
This route will load all posts from the DB and set up the posts
index in the SE (including special mappings etc). I'd like to keep as much of this setup in code as possible, we'll forget how to do it otherwise.
This routing block will likely use a secret configuration value (set as a runtime environment variable) as access control.
@@ -210,83 +189,3 @@ fun Application.configurePostRouting() { | |||
} | |||
} | |||
} | |||
|
|||
fun getFilterFromParameters(params: Parameters): List<Bson> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot tell you how thrilled I am to have deleted all of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more thrilled than I am to see it go 😁
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
b8f841f
to
137509b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly reading through the code at this point, I went through about half of the files already but have to postpone the rest of it.
Already left a few comments, mainly on Kotlin idiomacy.
api/src/main/kotlin/com/gmtkgamejam/ApplicationCallExtensions.kt
Outdated
Show resolved
Hide resolved
api/src/main/kotlin/com/gmtkgamejam/ApplicationCallExtensions.kt
Outdated
Show resolved
Hide resolved
@@ -210,83 +189,3 @@ fun Application.configurePostRouting() { | |||
} | |||
} | |||
} | |||
|
|||
fun getFilterFromParameters(params: Parameters): List<Bson> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more thrilled than I am to see it go 😁
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
137509b
to
9cceeb5
Compare
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
9cceeb5
to
0106d4b
Compare
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
0106d4b
to
a8dc357
Compare
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
a8dc357
to
36e2ee0
Compare
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
36e2ee0
to
f931647
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small suggested change
Full details and diagrams will be in the PR (#296), but the core details are: * the previous searching tool was extremely naive, entirely using `String.contains` * there was no spell correction, nor easy way to add it in * results came back in an arbitrary order and weren't easily scored A search engine ('SE' for short, used in the codebase) is a much more appropriate tool. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
f931647
to
f2a81b9
Compare
* SearchAsYouType is a pre-built field type that specialises in fast real time searching | ||
* @see https://opensearch.org/docs/latest/field-types/supported-field-types/search-as-you-type/ | ||
*/ | ||
"description_shingle" to SearchAsYouTypeProperty.Builder().build()._toProperty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a shingle? This name doesn't feel particularly descriptive or useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shingle
is a technical term - it's basically the search engine term used for an n-gram. If a description is "This is my cool team", the shingles list will be something like:
This is, is my, my cool, cool team
Overview
The core details are:
String.contains
A search engine ('SE' for short, used in the codebase) is a much more appropriate tool to use for searching and scoring documents. This change migrates the searching, scoring, and ranking logic to Opensearch to return a list of ordered SearchItem instances, which we then use to return a list of ordered PostItem instances to the user.
Why Opensearch?
Opensearch is an open source search engine forked from Elasticsearch, so has the benefits of being more approachable to developers with a familiarity with ES than something like Solr. Although it's not the most straightforward app to run from Docker, it's not too painful to set up locally.
We could use any search engine here, as we're not doing anything complex enough to strain the boundaries of one particular tool.
How are we using Opensearch?
The existing
GET /posts?...
URL stays the same, but instead of querying the DB with a whole bunch of boolean filters we now query the SE with a combination of fields (details available on request); the SE returns the resulting documents in an order from most-least relevant, and we use that order to return PostItems back to the user in the same order.Why a SE and a DB at the same time?
The DB is a persistent data store for posts stored more-or-less exactly as the user uploaded their content, plus extra metadata for us (e.g.
createdAt
,updatedAt
). The SE is a tool that manipulates data to the most easily searched form, often by doing compute work and storing it on disk (trading memory space for search speed).While we could use Opensearch as a persistent database, we'd need to maintain very clear boundaries between user data and indexed fields used for searching - one of the main benefits of using a search engine is it's ability to manipulate data into fields for faster or more powerful searching. Storing these alongside the core user data has the potential to get a bit messy, and we'd be indexing data (such as
reportCount
) that would never be searched.Code flow
For anything that doesn't involve searching, the code stays exactly the same:
For a search request the API hits the SE, the SE returns a list of PostItem IDs to the API, and the API loads those documents from the DB:
Creating, updating and deleting a post basically duplicates the existing flow for the DB onto the SE: