-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request][RFC] Multi-tenancy as a construct in OpenSearch #13341
Comments
Great proposal! From the search visibility point of view, currently the closest thing we can get about the "tenancy" of a search request relies on the thread context injected by the security plugin. The challenge here (also potentially the challenge to implement "default rules" mentioned in the RFC) is, the user has to use the Security plugin (Or we need to implement the methods to get "user" info for all identity plugins) and additionally, in VPC domains, the client IP obtained from the thread context is typically the ip of the Load Balancer (rather than the "real client ip"). We also briefly discussed some of those challenges in #12740 and #12529. We really need a way to define multi-tenancy as the first class feature in OpenSearch, which allows us to define "tenancy" across various layers. We can start from multi-tenancy for search queries to solve issues mentioned in query sandboxing and query insights. And we should design the solution flexible enough so that it can be reused in indexing and plugins as well - The labeling mechanism outlined in the RFC can be a great first step for achieving full multi-tenancy in OpenSearch :) Edit: Adding a very simple workflow diagram for client-side and rule-based labeling solutions, based on my understanding of this RFC: Also, after chatting with .Jon-AtAWS about this topic, I wanted to add several things to keep in mind for tenant labeling:
|
Also @msfroh, how is
really related to defining the tenancy info (or, associating traffic with a specific user/workload )? Please correct me if I'm wrong - I think it's more of a new way for managing a group of configuration to achieve "authorization, access-control, document-level security, query restrictions and more" for multi-tenant use cases? |
It's a way of explicitly defining the tenancy via the endpoint. If every tenant gets their own endpoint, it's server-side (so we don't need to trust the clients), but the resolution is easy/well-defined (versus applying rules). We could just use the endpoints to apply tenant labels, but having the "all-in-one" configuration is a bonus, IMO. |
Is there a variation of option 1 where the client is involved with establishing a session like typical web servers (requests without a session get a cookie from the server which is then passed back/around)? |
Thanks @msfroh I like the proposal where it says the tags/identity etc aren't mutually exclusive. We can define ways to define an identity or a user-label or for that matter any other attribute associated with a request while keeping some of it auto-created like identity if the user is using a security plugin or a tab if the client is passing certain attributes. From an indexing perspective I would try to see if we could use something like #12683 or likes of data stream/index template to define the tenancy at an index level. |
That would be doable. I'm not immediately seeing how it helps for the multi-tenancy case -- as far as I can tell, it would help track "client that sent request A also sent requests B, C, D, and E". I'm not sure how the downstream components (query insights, slow logs, etc.) would use that information. |
@msfroh Thanks for the great write up. Couple of comments in third form of this construct:
|
Thanks for taking the time to write this up @msfroh!
I am guessing the new object in the SearchRequest will still be a closed schema object (I mean there wouldn't be random fields coming in. e,g; it shouldn't be a
Regarding this I don't think this has to have affinity towards the sandboxing feature. Since rules will be an entity and can govern other actions as well like deciding whether to assign a label for features such as |
@msfroh Thanks for the discussion and proposal. As you called out both the approaches 1 & 2 are mutually exclusive and I think we will need both. I think option 1, can work as a override for certain attribute (or tags) type and we should not allow to update all the attributes types with this option specially the attributes around user/roles or other sensitive ones. For example: If an application developer is accessing the cluster as userA, they should not be able to provide a user tag with value userB. The user related attributed should probably be set on server side only using the rule based mechanism (ignoring the complexity which folks have called out in obtaining it for now). Based on this example, it seems to me there are at the minm. 2 categories of the attributes 1) which can be a random key=value pair, 2) derived from the request context and not allowed to be overridden. I think client side mechanism can be useful for category 1. Also probably we will need to limit the attributes which clients can set via some cluster defined settings to avoid explosion of different tags (may not be needed right away but removes user application from sending irrelevant tags and play nice).
As you have called out in pros of option 2, I think this will provide more control to the cluster administrator to begin with and enforce certain tag which can be later used in meaningful way (either for insights or logging or sandboxing). With option 1, administrator will need to rely on application developer honoring the tagging mechanism. So with this I was more inclined towards Option 2 which could give more adoption vs Option 1. |
Here's a simple POC code for the customized and rule-based labeling we have been discussing :) In the POC the labels are stored as a map, but as .kaushalmahi12 mentioned, we should consider using a closed schema object instead (possibly a json object as mentioned in the rfc). Also we should limit the number of labels we can have. Also, this POC only implements the workflow in search, ideally it should be generic enough to extend to other workflows. |
My thinking is that as soon as you apply approach 2 or 3, it should override any identity passed in the the request. If we're accepting identities passed in the search request, we obviously can't trust them. That's explicitly called out in the "cons" for the approach. |
Agreed with this approach. Approach 1 will still be required for the use-cases where the traffic is coming from a search application and the application can provide the identifying information as mentioned in the RFC above. |
In an ideal world, tenancy labeling should be calculated and provided by the centralized identity system (regardless which auth method your are using), but unfortunately we don't have one yet. For users who just want to know "who send what requests", the first approach should be good enough. As long as we don't use the labels for authentication/authorization, the security impact in this approach should be minimal. We can also have rules overriding all the customized labels or have a setting to disable the customized labeling (or limit the usage of certain important labels). Ideally, after the related work in security side to provide an authoritative way to infer users/tenancy information for any type of authentication systems, we can then add "the rule" to overide / attach the labels based on that. |
Although these labels may be harmless for some of the use cases upto some extent such as query insights (as long as the cardinality of the labels is low or QI has safeguards to prevent memory hogs). But if these labels are doing something intrusive like may be deciding the access to the resource distribution (example QueryGroup based Resource Allocation) then it becomes indispensable to have the safeguard mechanism to avoid these scenarios. But then if we think of it in the long term do we really want to have the authN/authZ for these labels in all the consuming features of such labels ? Probably No!, since all of these features will be providing access to the users based on authN/authZ credentials or lets say rule based technique. What I think that these labels should be used purely for routing purposes in the consuming features. |
Does this require documentation for 2.15? If so, please raise a doc PR by 6/10/24. Thanks. |
@ansjcy Can we create a meta and list all the related enhancements together as milestones for this? |
Hi @getsaurabh02! We have this meta issue to track the multi-tenancy effort: #13516 Let me include all the ongoing work in this meta issue. |
We had some interesting discussions in this PR opensearch-project/security#4403 with the security plugin folks. @DarshitChanpura @cwperks maybe we can continue the discussions in this thread for better visibility :)
|
Is your feature request related to a problem? Please describe
I've been involved with multiple projects and issues recently that try deal with the notion of "multi-tenancy" in OpenSearch. That is, they are concerned with identifying, categorizing, managing, and analyzing subsets of traffic hitting an OpenSearch cluster -- usually based on the some information about the source of the traffic (a particular application, a user, a class of users, a particular workload).
Examples include:
Across these different areas, we've proposed various slightly different ways of identifying the source of traffic to feed the proposed features.
I would like to propose that we first solve the problem of associating traffic with a specific user/workload (collectively "source").
Describe the solution you'd like
We've discussed various approaches to labeling the source of search traffic.
Let the client do it
In user behavior logging, the traffic is coming from a search application, which can presumably identify a user that has logged in to the search application. The application can provide identifying information in the body of a search request. This would also work for any other workload where the administrator has control over the clients that call OpenSearch.
Pros:
SearchRequest
(or more likelySearchSourceBuilder
, since it probably belongs in the body). In the simplest case, this property could just be a string. For more flexibility (e.g. to support the full suite of attributes in the UBI proposal), the property could be an object sent as JSON. Of course, as these labeling properties grow more complex, it also becomes harder for downstream consumers (like query insights) to know which object fields are relevant for categorization.Cons:
Rule-based labeling
This is the approach that @kaushalmahi12 proposed in his query sandboxing RFC. A component running on the cluster will inspect the incoming search request and assign a label. (Okay, in that proposal, it would assign a sandbox, but it's the same idea targeted to that specific feature.)
Pros:
Cons:
Custom endpoints for different workloads
This is an evolution of what @peternied proposed in his RFC on views. Over on opensearch-project/security#4069, I linked a Google doc with my proposal for an entity that combines authorization (to the entity), access-control (to a set of indices or index patterns), document-level security (via a filter query), query restrictions, sandbox association, and more.
Pros:
Cons:
What do I recommend?
All of it! Or at least all of it in a phased approach, where we learn as we go. The above proposals are not mutually-exclusive and I can easily imagine scenarios where each is the best option. In particular, if we deliver the "Let the client do it" solution, we immediately unblock all the downstream projects, since all of the proposed options essentially boil down to reacting to labels attached to the
SearchRequest
(or more likelySearchSourceBuilder
).I think we should start with the first one (Let the client do it), since it's easy to implement. The rule-based approach can coexist, since it runs server-side and can override any client-provided information (or fail the request if the client is trying to be sneaky). I would recommend that as a fast-follow.
The last option is (IMO) nice to have, but limited to a somewhat niche set of installations. It's probably overkill for a small cluster with a few different sources of traffic, but it would be helpful for enterprise use-cases, where it's important to know exactly how a given tenant workload will behave.
Related component
Search
Describe alternatives you've considered
The above discussion covers three alternatives and suggests doing all three. If anyone else has suggestions for other alternatives, please comment!
What about indexing?
I only covered searches above, but there may be some value in applying the same logic to indexing, to identify workloads that are putting undue load on the cluster by sending too many and/or excessively large documents. My preferred approach to avoiding load from indexing is flipping the model from push-based to pull-based (so indexers manage their own load), but that's probably not going to happen any time soon. Also, a pull-based approach means that excessive traffic leads to indexing delays instead of indexers collapsing under load -- you still want to find out who is causing the delays.
@Bukhtawar, you're our resident indexing expert. Do you think we might be able to apply any of the approaches above to indexing? Ideally whatever we define for search would have a clear mirror implementation on the indexing side to provide a consistent user experience.
The text was updated successfully, but these errors were encountered: