-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding User Behavior Insights functionality. #13546
Conversation
Signed-off-by: jzonthemtn <[email protected]>
❌ Gradle check result for bea92e0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
I think it is a flaky test outside of UBI changes:
And described by #13220. |
Signed-off-by: jzonthemtn <[email protected]>
❌ Gradle check result for 2fb835f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Also appears to be a flaky test outside of the code changes here. #1006 |
@jzonthemtn I haven't look into implementation yet (sorry about that), but it looks to me the plugin should not be part of the core but a separate repository (like most of the non essential plugins out there). |
|
||
## Indexing Queries | ||
|
||
For UBI to index a query, add a `ubi` block to the `ext` in the search request containing a `query_id`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if you do not have a query_id
, one will be provided for you.
Is the presence of an empty ubi
block sufficient to get the logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it generate a query_id
if none is provided but I should not have yet. In this first version, a query_id
is required. This is because the search response is not yet being modified. In a later revision, query_id
will be optional and generated if not provided and returned in the search response's ext
. I will remove that code in getQueryId()
to make a random UUID if it's null
.
An empty block was sufficient, but I will change it to require that the ubi
block contains a query_id
. If no ubi
block, or an empty ubi
block, the rest of the code in the UbiActionFilter
will be skipped.
client.admin().indices().exists(indicesExistsRequest, new ActionListener<>() { | ||
|
||
@Override | ||
public void onResponse(IndicesExistsResponse indicesExistsResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a couple of risks with this:
- It doesn't seem to be checking
indicesExistsResponse
. If the indices do exist, will it issue theCreateIndexRequest
s again? - The
exists()
call asynchronously calls the exists API, and then the listener for that call creates the indices. Meanwhile the thread that calledindexUbiQuery
falls through to line 180 and tries to write to the queries index (possibly before it's been created).
I could be mistaken, but I think the sequence diagram is something like:
sequenceDiagram
CallingThread ->> OpenSearch: Exists?
CallingThread ->> OpenSearch: Index query
OpenSearch ->> ExistsListener: IndicesExistsReponse
ExistsListener ->> OpenSearch: CreateIndex(UBI_EVENTS_INDEX)
ExistsListener ->> OpenSearch: CreateIndex(UBI_QUERIES_INDEX)
(Incidentally, this is my first time using Mermaid to generate a diagram in a GitHub comment -- that's pretty awesome.)
I think you may need to chain the callbacks together, so it's something like:
- Calling thread calls
indexExists
- IndexExists listener:
- If both indices exist, call
client.index()
- Else, create one or both indices, with a callback (can use a shared callback that counts down the number of indices left to create).
- In the CreateIndex listener, if decrementing the counter gives you 0, then call
client.index()
.
- In the CreateIndex listener, if decrementing the counter gives you 0, then call
- If both indices exist, call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. Will update! Thanks for the implementation suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msfroh I thought through this some more and I want to minimize the number of calls the client
makes. There's one call to determine if the indexes (ubi_queries
and ubi_events
) exist. If false
, there are two calls to create those indexes. If one of the indexes already exists the creation has no effect. Alternatively, if the indexes don't exist, the client
needs to check which index(es) don't exist and that adds two extra calls. What do you think?
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
out.writeString(queryId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be getQueryId()
so that we always materialize the query ID before serializing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good idea, especially for when the query_id
is optional. Will update.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13546 +/- ##
============================================
+ Coverage 71.42% 71.53% +0.11%
- Complexity 59978 61077 +1099
============================================
Files 4985 5058 +73
Lines 282275 287313 +5038
Branches 40946 41617 +671
============================================
+ Hits 201603 205518 +3915
- Misses 63999 64748 +749
- Partials 16673 17047 +374 ☔ View full report in Codecov by Sentry. |
be5b3a0
to
1d5d871
Compare
…lock to have a query_id. Signed-off-by: jzonthemtn <[email protected]>
❌ Gradle check result for be5b3a0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 1d5d871: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 333e45e: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
This is super important and something we should figure out ASAP as it will determine where this code will live. Have we discussed this yet in any other issue? My inclination is that this feature might benefit from being a separate repository where it can iterate much more quickly and not be so tightly coupled to the OpenSearch repository. |
Appreciate the nudge on this point. Just some history since not everyone is familiar with UBI, this project started as an external plugin and was then moved to a module (this PR). |
Apologies for missing the search triage meeting, but I heard it was decided that UBI will be moved to an external plugin project repository so I am going to close this pull request. Thanks to everyone for the comments! |
Description
Adds the User Behavior Insights (UBI) functionality described in #12084.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.