Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Initial delay on enforcing policy on new indices potentially triggers unnessecary allocations #174

Open
spapadop opened this issue Oct 28, 2021 · 7 comments
Labels
enhancement New request

Comments

@spapadop
Copy link

Describe the bug
In our setup we have both SSDs and CEPH storage, in order to achieve a hot/cold architecture. Thus, within a cluster there are data nodes with SSD and data nodes with CEPH. To distinguish them, we add a node.attr to the respective elasticsearch.yml:

  • node.attr.node_attr: default for SSD nodes
  • node.attr.node_attr: ceph for CEPH nodes

Using an Index Management policy, we naturally want to assign any new index to SSD nodes and then after X days to move to CEPH nodes. Here's an example policy:

{
    "policy_id": "hot_cold_delete_13m",
    "description": "Hot/cold (15 days)/delete after 13 months",
    "default_state": "hot",
    "states": [
        {
            "name": "hot",
            "actions": [
                {
                    "allocation": {
                        "require": {
                            "node_attr": "default"
                        },
                        "wait_for": false
                    }
                }
            ],
            "transitions": [
                {
                    "state_name": "cold",
                    "conditions": {
                        "min_index_age": "15d"
                    }
                }
            ]
        },
        {
            "name": "cold",
            "actions": [
                {
                    "replica_count": {
                        "number_of_replicas": 0
                    }
                },
                {
                    "allocation": {
                        "exclude": {
                            "node_attr": "default"
                        },
                        "wait_for": false
                    }
                }
            ],
            "transitions": [
                {
                    "state_name": "delete",
                    "conditions": {
                        "min_index_age": "400d"
                    }
                }
            ]
        },
        {
            "name": "delete",
            "actions": [
                {
                    "delete": {}
                }
            ],
            "transitions": []
        }
    ],
    "ism_template": {
        "index_patterns": [
            "*"
        ],
        "priority": 0
    }
}

The policy works fine indeed, however we experience some delay of few minutes upon indices creation. So, initially a new index may end up on CEPH nodes and then the policy gets applied, moving the index to SSD nodes where they should belong. That can be potentially a demanding task that would be good to avoid.

To Reproduce
Steps to reproduce the behavior:

  1. Setup a hot/cold architecture based on node.attr
  2. Apply a policy like the one above to all new indices
  3. Create a new index
  4. Observe that it may initially end up on the wrong node, having to wait for the policy application to force an allocation to the proper node.

Expected behavior
The policy is expected to be applied immediately at index creation thus leading the new indices to be stored dirrectly on SSDs, instead of them ending up on CEPH and then during policy application get moved into SSD.

Plugins
All official OpenDistro/Opensearch plugins

Desktop (please complete the following information):

  • OS: CentOS / CentOS Stream
  • Version 8

Additional context
There is a workaround of adding "index.routing.allocation.require.node_attr": "default", on the index template, however it would be good if the index policy could be applied immediately at index creation, in order to avoid the need of adding that setting also within the index template.

@spapadop spapadop added Beta bug Something isn't working untriaged labels Oct 28, 2021
@thalurur thalurur added feature and removed Beta bug Something isn't working untriaged labels Nov 1, 2021
@thalurur
Copy link
Contributor

thalurur commented Nov 1, 2021

@spapadop Today the expected behavior for policy is that the policy gets attached at the moment the index is created when its matching policies index pattern.

However, the actions in policy are triggered at a later point in time, not at the time policy is attached.

This request would be to support a new feature - i.e run the action immediately when the policy is attached to the index. I marked this with a feature tag, so we can evaluate this further.

In the mean time we currently have a setting on how frequently policy actions are executed (including the very first execution) - plugins.index_state_management.job_interval This setting will be applied to all policies not just to one. This cannot be zero

@thalurur thalurur changed the title [BUG] Initial delay on enforcing policy on new indices potentially triggers unnessecary allocations [Feature Request] Initial delay on enforcing policy on new indices potentially triggers unnessecary allocations Nov 1, 2021
@thalurur thalurur added the enhancement New request label Nov 2, 2021
@dbbaughe
Copy link
Contributor

dbbaughe commented Nov 2, 2021

@thalurur Even if the policy is executed immediately it won't help here as the applying of a policy to an index happens after an index has been created and allocated. In the example it means the index has potentially already been allocated to the CEPH nodes. I think the request is asking for ISM policies to intervene during index creation and allocate the indices at creation to the correct nodes, but that's not something we can do currently and I'm not sure it's something we'll support. As noted in the additional context, the correct solution would be to ensure the index template for these indices has the correct allocation settings as those are what will get added at creation time and used.

@spapadop
Copy link
Author

spapadop commented Nov 4, 2021

Hello @thalurur @dbbaughe,
Thanks for your input. Indeed, I asked for ISM policies to intervene during index creation. Thinking over it again, maybe ISM shouldn't interfere there and let index template do the job. Setting plugins.index_state_management.job_interval to a low value isn't a good solution, as it will trigger too many jobs without any real benefit.

@thalurur you mentioned:

However, the actions in policy are triggered at a later point in time, not at the time policy is attached.

As you said, maybe it's indeed worth pushing towards policy being triggered immediately after it is attached, and then conform with the corresponding plugins.index_state_management.job_interval. This sounds like an expected behaviour: when I attach a policy to an index I expect it to run asap, immediately if possible.

That being said, feel free to decide about the fate of the issue, maybe renaming it would make sense.

@dbbaughe
Copy link
Contributor

dbbaughe commented Nov 4, 2021

@spapadop So perhaps for that second paragraph you might want something like this:
#48

It's somewhat related, essentially just keep running the policy if there's work to do before going to sleep. And to your point, you do not want to wait an initial <job_interval> time before it starts for the first time, you want to it immediately start. For that we can't do anything in ISM, we'd have to revisit the Job Scheduler plugin as it is the one that is scheduling these jobs to execute (and we'd need a way to tell it to execute immediately for the first time).

As for the first part.. yeah I'm not sure that's something we'll tackle in ISM as there is already a concept for that which is Index Templates which exist in core (and have existed well before ISM). If there are any settings, mappings, etc. that you want on the index at creation time, that is what index templates would be for.

If you're OK with it, we can close this issue and update the one linked above with the additional use case of executing the policy immediately after attachment.

@spapadop
Copy link
Author

spapadop commented Nov 9, 2021

@dbbaughe alright sounds good to me. Still though, as Job Scheduler has to be revisited for my request (immediately execute policy upon attachment), I am not sure if attaching it to #48 would help. However, feel free to do so if there is a chance for this to be worked upon.

@dbbaughe
Copy link
Contributor

@spapadop I see we actually had an issue for this previously on the OpenDistro repo
opendistro-for-elasticsearch/job-scheduler#76

Looks like we didn't migrate over all the issues for job scheduler to this new org.. will do that too.
Let me sync up with the developer that initially worked on job scheduler, perhaps there was a limitation that was blocking us on this.

@bowenlan-amzn
Copy link
Member

Should revisit if we can execute action along with policy initialization during the first job run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New request
Projects
None yet
Development

No branches or pull requests

4 participants