Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow an option to build a custom workload from hidden/datastream indices #420

Open
dazoakley opened this issue Dec 5, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@dazoakley
Copy link

Is your feature request related to a problem? Please describe.

In our opensearch cluster we use datastreams to handle index rotation when they hit certain thresholds. We'd like to create some custom benchmarking workloads based off our indices, but we can't as the indices for datastreams are "hidden" and not picked up by the create-workload command - they begin with .ds-.

This seems to be the offending line in the code: https://github.com/opensearch-project/opensearch-benchmark/blob/main/osbenchmark/workload_generator/index.py#L63

Describe the solution you'd like

We would like an option on the create-workload command to be able to include hidden indices (or even just "datastream" indices) within a custom workload.

@dazoakley dazoakley added the enhancement New feature or request label Dec 5, 2023
@jordarlu
Copy link

jordarlu commented Dec 5, 2023

Hello, @rishabh6788 @IanHoang @gkamat, would you please have a look and give your comments? thanks !!

@rishabh6788
Copy link
Collaborator

I think the check is in there to make sure user doesn't include system/security indices by mistake while generating workloads using OSB.
A quick hack I can think of is that you checkout the opensearch-benchmark repo, remove the condition in the code you mentioned and then do a local install using pip3 install -e ..
You should be able to bypass this check.

@IanHoang
Copy link
Collaborator

IanHoang commented Dec 5, 2023

What @rishabh6788 mentioned above is a good and quick workaround. We can look at adding this as a flag, such as --include-hidden-indices in the create-workload feature if that helps. @dazoakley If you'd like, you could make a quick fix for this in the code-base and submit a PR for this option?

@dazoakley
Copy link
Author

Hi folks, thanks for the prompt replies. 😄

Yep, I've been using that workaround suggested, but yep, having it as an actual cli option would be much more convenient. If you're ok for me to submit a PR with that change I'd be happy to - I'll see if I can get it done later today.

In fact, @IanHoang would something like --include-datastream-indices be a better flag? Then you'd still stop people being able to pull in system/security indices, but allow the use of datastreams?

@IanHoang
Copy link
Collaborator

@dazoakley Apologies for the late response! Either --include-datastream-indices would work. We would just have to include a check that none of the "datastream" indices collected are security / system indices as you mentioned.

Once again, if you cut a quick fix for this, we can quickly address the PR. Thank you for your patience.

@gkamat gkamat changed the title Allow an option build a custom workload from hidden/datastream indices Allow an option to build a custom workload from hidden/datastream indices Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Quarter
Development

No branches or pull requests

5 participants