-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Application Based Configuration Templates #12683
Comments
Thanks @mgodwan for putting across the proposal. This is definitely in the right direction to help OpenSearch users optimize for their use case without needing to be an advanced/ expert user and understand which settings to tune at large.
For Context templates repository, It will be pre-defined in OpenSearch core or users can modify the templates on demand? Also, how much customizations are allowed? I feel system generated contexts shouldn't be allowed to be modified, users can define their own contexts if needed on top of system contexts.
Existing component templates are part of cluster metadata. Are you proposing to store these context templates in a separate system index? why do you think it can't fit into cluster state when these are just subset of settings optimized for a usecase? Do you foresee this growing huge? This could be a new custom metadata in cluster state. One aspect that I don't see getting discussed is : |
Thanks @shwetathareja for sharing your thoughts on this.
I believe we should have a set of pre-defined templates exposed through core. Users/Plugins should be allowed to create more on top of these or new ones but not modify the existing ones exposed via opensearch core.
Yes, I was thinking if it grows huge, it would be better to decouple from the cluster state and maintain a separate system index. I don't have a strong opinion on this implementation detail given we have been doing some work to make the cluster state more scalable and extensible. As we move towards looking more into low level details, we can perform some stress testing to see if cluster state fits the use case, and reduce the overhead of maintaining a different system index.
The idea is to continue to have settings for each restriction/optimization which is applied while the templates acting as an interface to get the details into the index. Those settings can continue to govern the behavior during runtime.
I don't think we should allow this. Context once tied to the index, adds certain restrictions and it may not always be safe to remove the context. |
This looks very useful! "Context Aware" makes it sound like they detect the context themselves, which if I understand the proposal correctly, they don't -- they're just applicable to a certain use case I think you're calling them "templates" because they have some un-set parameters. But that doesn't tell us what they are templates for. They are templates for configurations, right? So maybe call them "standard configurations" or "parameterized configurations", e.g., the Logs Configuration, the Metrics Configuration, etc.? |
This certainly sounds useful and certainly opens up possibilities to optimize. I would first want to explore the dimension of the problem. Is it a template or a type? is there a fundamental difference between each context or use case which warrants different templates in which case is template the right solution or we should look at different types of indices? Does it need to be extensible? Is it fundamentally a problem of how data is stored. For example the index is an inverted index. Does this mean we need a different type of index? Which means we may want to perform operations like write and search differently for such data. Would be good to get some of these answers. |
Thanks @smacrakis for agreeing to the problem statement highlighted and your suggestions.
Doesn't the term "templates" denote "parametrized configurations"? I am open to a new name but I don't think context aware would mean that it can detect context, it just means that it is designed to be aware of the context (i.e. use case). If the wider community still feels that the term "context aware" may be a forced-fit here, we can work on updating the terminology. |
Thanks @rohin for your thoughts and questions.
The index abstraction exposed today in OpenSearch over Lucene index is configurable in terms of what we want the index to provide (through settings, mappings, etc.). The challenge comes at a place at deciding how to configure those settings. Hence, the proposed templates act as a provider for the configuration best suited for the use cases. Instead of needing this to be an index type (which adds extra coupling), relying on the index metadata/settings for granular control to apply this allows advanced users to configure in depth directly on the index and reduce the entry barrier for new users at the same time through the templates.
Could you highlight the dimensions of extensibility you were thinking of here? I can better answer based on the thought behind this.
I think inverted index is the core for any kind of analytics we want to support and we have other data structures to support operations like sort and aggregations as well. Today as well, based on the mappings, we use different inverted index implementation (e.g. FST for text, BKD Tree for numeric fields, etc) and may not choose to allow certain operations for the use cases based on what kind of data structures are created. While essential data structures are a core point of proposal (e.g. for frequently updated events with performance as primary factor, always create a bloom filter), there are other optimizations beyond how data is stored (e.g. refresh interval, replication strategy, merge policy etc) which have a key role to play in the performance/cost optimizations users can get out of their OpenSearch cluster for their respective use cases. |
I still think "context aware" is wrong. They are not aware of their context -- they are simply designed for a particular application or use case. The word "template" does imply parameterization, but it doesn't say what they are templates for. |
I think you are primarily concerned about the word aware(ness), would context based templates or context specific templates sounds right? |
Yes, I think "aware" is misleading, because it implies that the template adapts to the context it's in. |
@smacrakis @backslasht @shwetathareja @rohin Thanks for your points around the terminology for the feature. Based on the feedback provided, How does "Application Based Configuration(ABC) Templates" sound for this? |
Sounds good, thanks for the discussion! |
This seems like it could coexist as one or two pages of documentation to talk about suggested index settings for certain use cases. As long as people have a resource to read why those settings are better. A second thought - if we're going to offer options like this, we should at least build a baseline dashboards page that shows toggle switches for these most common index optimization options. It never hurts to include text/hovers on what those options specifically do. I've always wondered why a lot of new features don't at least have a bare minimum gui implementation. Without a GUI, we're enabling professionals while withholding education from newcomers. |
A simple question here. Leading to the order of application: context template -> component template -> index template |
Is your feature request related to a problem? Please describe
Today, OpenSearch provides users multiple knobs/settings to configure their indices (e.g. number of shards, replicas, replication types, refresh interval, merge settings, etc.). It also exposes different settings/policies to configure different plugin based actions (e.g. rollovers, rollups, transforms, k-NN tuning, etc.). A combination of these settings/policies needs to be precisely configured to get the best experience in terms of various performance and usability dimensions such as throughput, latency, storage usage, etc.
It is difficult for users on-boarding new use cases to OpenSearch to get these configurations right in the first place as it requires extensive experimentation and developer effort to get these right.
Since users use OpenSearch for various different use cases (e.g. Log Analytics, Metrics, Text Search, Security Analytics, ML, etc), they need to go through the entire set of available knobs, try them out and then decide what works best for their use case. This creates a very visible friction while on-boarding to OpenSearch and when the users are unable to get it right after few attempts, they end up going with alternate solutions.
One of the ways this problem can be mitigated is by knowing the context of the indices and based on the context, default values for these settings can be made available as templates (think of it as predefined system templates). We would like to propose the concept of context aware index templates in OpenSearch which will allow users to easily configure their indices based on the use-cases they are looking to build for. This context can be a first class citizen for the indices via the templates, and any opt-in/opt-out features developed in OpenSearch can be applied to such indices out of the box based on the use-case selected to reduce the friction and promote a seamless on-boarding experience.
Few of the example use cases we can expose directly are:
Enable Deflate/ZSTD (Better Storage)
Higher Refresh Interval/SegRep [Better Indexing Throughput]
Merge Policy [LogByteSize by default]
Disable Source (Better Storage)
Enable Star Tree Index (Better Aggregations)
High Refresh Interval/SegRep [Better Indexing Throughput]Merge Policy [LogByteSize by default]
Low Refresh Interval/DocRep [Faster Document Visbility]
Low merge delete threshold (Faster reclamation of storage on updates)
Though this provides a seamless out of the box experience, they may be cases where users want to override some settings. This can be done by extending the system templates and have the corresponding setting values overridden.
Once these use cases are exposed to the users and as we continue to build optimizations which may provide more benefit for specific use cases, they can be directly adopted by the users (who explicitly opted for it) through the updated context aware index template definitions (upon upgrades) without requiring users to go through the release notes and figure out if something would be useful for them.
Proposal
For the pain points discussed above, it becomes necessary to ensure that a simple interface is provided to the users to manage their indices. In order to do so, we can use the existing components and terminology which users understand and build the new functionality on top of it.Following is a high level example of how a template may look like and applied on an index
Alternatives Explored
Using Data Streams
Data Streams are a generic abstraction for time series data and does not extend to apply optimizations for specific use cases within the time-series universe. The idea is to allow for a generic abstraction which can facilitate optimizations across various use cases which users would like to build based on other dimensions. The proposed solution should be applicable for data-streams as well as they will also benefit from the context.Using Existing Index Templates
Index Templates require knowledge of the available settings and optimizations by the customers. Even if we create index templates on user’s behalf, we still will not be able to apply optimizations on field level, etc since that may depend on the user configuration within mappings, etc. Also, the templates don’t support use cases such as refreshing the index created through it on changes (e.g. on upgrades, we may add new optimizations to be applied) Hence, we may need a new abstraction.FAQs
Q: How can you help?A: Any feedback on the overall idea and proposal is welcome. If you have specific requirements/use-cases which are not addressed by the above proposal, please let us know.
Q: Why not propose to use cluster state metadata for new system context templates?
A: The templates can grow over time and tying it with cluster state metadata may cause bottlenecks. Hence the storage component for this new resource can be a system index.
Q: As a user, will I be to disable certain optimizations?
A: Yes, we would still like to support customization on top of suggested out of box optimizations.
The text was updated successfully, but these errors were encountered: