[WLM] Automated labeling of search requests #16797

kaushalmahi12 · 2024-12-06T04:56:38Z

Is your feature request related to a problem? Please describe

Recently we launched WLM subfeature i,e; multitenant search resiliency. But the feature still required an external hint to be sent along with each request via HTTP header. Hence this approach puts the burden on the user to apply these hints intelligently.

This can become a real pain if the access is programmatic and if not planned properly the programmatic multitenant access can become unmanageable. Hence it would rather be great if user could just define some rules to determine what should be the right tenant for certain class (confirms to a rule) of request.

Though we have touch based on this idea in the following RFCs

In this issue I want to go over the high level approach to achieve this.

Describe the solution you'd like

Given that this tagging component will lie directly on the search path, we will keep efficient in memory snapshot of the rules for faster processing. The label assignment will only happen once for a request at co-ordinator node irrespective of number of shards it is going to hit.

Rules schema and Storage options

Rule Schema

{
    
   "attribute1": ["value*"],
   "attribute2": ["value*"],
   "label": "fjagjag9243421_425285",
   "updatedAt": "12-03-2024T18:00:23Z"
}

Cluster State - If we use search pipelines to encapsulate rules for determining the label then soon enough the pipelines will explode in numbers which can be detrimental to cluster state processing and could become a bottleneck in clusters stability. In addition to this the cluster state is already quite bloated hence maybe it wouldn’t be such a great idea to use this option.
System Index - This will definitely help us decouple the rules storage and processing from cluster manager related tasks. But since there is no mechanism in indices to propagate these changes to all nodes, it will compel us to either periodically refresh the rules on all nodes or define a custom request handlers to carry out the refresh.

In-memory Structure for Rules

Since we want to hold all the rules in memory and do a fast prefix based string matching trie data structure becomes a natural choice for this problem.

We will keep per attribute trie in memory, each trie will give us a possible list of matching labels.

Rules storage

Following diagram illustrates the rules storage process and how does the structure evolves over time on incremental rule additions [Note: in the diagrams I have used query groups but this will be a generic label which other features can also use]

Rules Matching

Given that the rules are stored in in-memory trie data structure, single attribute value match could yield multiple results. Now there are following scenarios for the string search in the trie

The node where the search ends already has a label value
The node where the search ends don’t have the label but has some child subtrees. So the possible matches will be all the closest node’s queryGroupIds from this node to keep the list minimal.

Now given these N lists of matches, 1 per attribute. We can select an item which will appear in most number of lists and if there is a tie then pick the one with shortest depth in the tree. If the match results in a tie even with depth as a param we will use first query group from the list lexicographically.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

kaushalmahi12 · 2024-12-09T18:49:07Z

@froh @reta @jainankitk @backslasht @andrross
Can you provide your suggestions and review this ?

kaushalmahi12 · 2024-12-09T19:22:07Z

If we want to separate out the rules for features then having limited choice based value for additional field mentioning which feature would use the rule.

the schema could look like following

{
    
   "attribute1": ["value*"],
   "attribute2": ["value*"],
   "label": "fjagjag9243421_425285",
   "updatedAt": "12-03-2024T18:00:23Z",
   "feature|SomeBetterName": "WLM"
}

reta · 2024-12-11T21:48:25Z

Thanks @kaushalmahi12 (sorry for the delay).

System Index - This will definitely help us decouple the rules storage and processing from cluster manager related tasks. But since there is no mechanism in indices to propagate these changes to all nodes, it will compel us to either periodically refresh the rules on all nodes or define a custom request handlers to carry out the refresh.

I think this is the right approach to manage rules. Also, I suspect the labeling (rule matching) should only be applied on coordinator node(s)? Regarding the data structures, I think it would be great to understand how exactly the attributes to match against are extracted from the search requests, do we have an RFC/Feature Request for it? (sorry if I missed it)

kaushalmahi12 · 2024-12-11T23:27:01Z

Thanks @reta for looking into it.
I will be writing a detailed design for Rule Matching and LLD and it will be part of the second sub issue in the list of issues mentioned in #16813. This is just a high level proposal that outlines the brief about the approach.

I suspect the labeling (rule matching) should only be applied on coordinator node(s)?

Not sure if I follow completely but IMO the mapping should be 1:1 from a user level request and how we treat it within the system. With that being said maybe msearch and mget APIs.

kaushalmahi12 added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 6, 2024

github-actions bot added the Search Search query, autocomplete ...etc label Dec 6, 2024

kaushalmahi12 self-assigned this Dec 6, 2024

kaushalmahi12 mentioned this issue Dec 9, 2024

[META] Automatic labeling using Rules #16813

Open

sandeshkr419 removed the untriaged label Dec 18, 2024

peterzhuamazon added this to Search Project Board Dec 19, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Dec 19, 2024

kaushalmahi12 mentioned this issue Dec 20, 2024

[Proposal] Rule Matching #16888

Open

ruai0511 mentioned this issue Dec 20, 2024

[WLM] Synchronizing Rules Across Nodes #16889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WLM] Automated labeling of search requests #16797

[WLM] Automated labeling of search requests #16797

kaushalmahi12 commented Dec 6, 2024 •

edited

Loading

kaushalmahi12 commented Dec 9, 2024 •

edited

Loading

kaushalmahi12 commented Dec 9, 2024 •

edited

Loading

reta commented Dec 11, 2024

kaushalmahi12 commented Dec 11, 2024 •

edited

Loading

[WLM] Automated labeling of search requests #16797

[WLM] Automated labeling of search requests #16797

Comments

kaushalmahi12 commented Dec 6, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Rules schema and Storage options

In-memory Structure for Rules

Rules storage

Rules Matching

Related component

Describe alternatives you've considered

Additional context

kaushalmahi12 commented Dec 9, 2024 • edited Loading

kaushalmahi12 commented Dec 9, 2024 • edited Loading

reta commented Dec 11, 2024

kaushalmahi12 commented Dec 11, 2024 • edited Loading

kaushalmahi12 commented Dec 6, 2024 •

edited

Loading

kaushalmahi12 commented Dec 9, 2024 •

edited

Loading

kaushalmahi12 commented Dec 9, 2024 •

edited

Loading

kaushalmahi12 commented Dec 11, 2024 •

edited

Loading