-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex Wizard #46755
Comments
Pinging @elastic/es-ui |
Hey @cjcenizal this looks great! A few notes/comments from the team below. Index Management UX -> Reindex Tasks:
Step 2: Configure destination index:
Step 3: Document transformations:
Step 4: Performance:
Reindex in Progress:
Simulate endpoint for Reindex:
|
Thanks @zuketo! I've updated the description with deets around the reindex wizard tasks API we would have to build in Kibana and I moved the performance options into the "Start reindex" step (great idea!). I would like to keep the progress screen in-place and see if it feels like a small reward (as I expect/hope it to). If it's a dud we can get rid of it and redirect to the tasks tab as you suggest. More thoughts below.
I don't have an answer to this yet. I'll defer to Seb and Alison when they work through the implementation and see what's possible.
Cancel the reindex -- a better icon might be necessary or just a button that says "Cancel" or "Stop". :)
The simplest version of this will be blocks of editable JSON for the mappings, settings, and aliases. We might model this UI off our Index Template creation form (below), but I defer to Alison, Seb, and any designers who assist us on this point. Imagine the screenshot below prepopulated with JSON of course.
This does sound complicated for us to implement. That said I also don't think it's necessary for us to implement it in this iteration. I think we should punt for now. |
I believe this feature is dependent on some of the work we are doing on the Elasticsearch side for reindex API v2. It may help to cross-reference them here for tracking/visibility. |
Warn users about dangers of reindexing large data setsPer conversation with @aleph-zero, reindexing could be unfeasible for real-world datasets which tend to be quite large. There may also not be a huge need for fixing mappings on large datasets. It would be useful for smaller datasets. The Reindex Wizard should warn users about potential problems (e.g. the operation could be very expensive) if they are attempting to reindex large sets of data. |
@111andre111 Would you mind sharing some info on your use case for changing the mappings of an existing index on the fly? Are you using |
Hi @cjcenizal Maybe I was not concrete enough. But what is possible for instance is, to change the I am not sure, I did not test all combinations. But maybe it is possible to change even Datatypes under certain circumstances, unless I didn't find one example at the moment Here an example for changing type parameters:
If the type parameters cannot be changed, I got error valid messages as well. Does that make sense? |
@111andre111 Thanks for clarifying! That makes sense. There are some parameters that can be changed on an existing index which don't require reindexing. In this case, I think we'd need to update the Index Management UI with an "Edit Index" page that allows users to edit specific aspects of the index's mappings. |
@cjcenizal Are there any updates on providing reindexing from UI? It would make it way easier, to provide a guided reindex in Kibana. This should also allow an in-place reindex of a full data stream. |
Pinging @elastic/kibana-management (Team:Kibana Management) |
Replaces #8110
Summary
The Reindex API is powerful but complex. A UI will make this tool easier to use by guiding and educating the user on its features.
Users will navigate to Index Management, select one or multiple indices, and execute the “Reindex” action. This will take the user to the Reindex Wizard, which breaks the reindex process into multiple steps. Once the user has started a reindex, they'll be immediately shown the progress of the reindex task.
Implementation details
The Reindex API doesn’t create the destination index for you. That’s a separate step that is abstracted by the Reindex Wizard. The Upgrade Assistant server-side logic provides a similar abstraction. We can extract this logic into a separate
index_operations
plugin and reuse it in both the Upgrade Assistant and the Reindex Wizard.Similarly, if there is an alias pointing to the source index, it's reasonable to assume that users may want to redirect it to the destination index once the reindex is complete to avoid any downtime. This is also its own API call which we can abstract away. Upgrade Assistant has already implemented a similar abstraction which we can extract into the
index_operations
plugin.Plan of attack
Note that implementing the Reindex Wizard is a superset of implementing most of the logic needed for "Create index" and "Clone index" features. If we doubt we'll be able to complete the Reindex Wizard in a release cycle, we should consider shipping one or both of these features first, and then build the Reindex Wizard upon them in another release.
Index Management UX
Users should be able to select multiple indices and have the option to "Reindex into another index" or to select a single index and have the option to "Reindex other indices into this index". Selecting the first option will open the Reindex Wizard with these indices pre-selected as the source indices and selecting the second option will open the Reindex Wizard with this index pre-selected as the destination index.
Users will also be able to click a "Reindex" button which opens the Reindex Wizard with all fields blank.
If reindex tasks are active, the user should be able to view their status within Index Management beneath a "Reindex tasks" tab. We would need to create a special endpoint for retrieving these tasks, since a reindex task created by the Reindex Wizard is an abstraction over both index creation and reindexing. The Upgrade Assistant does something similar so it might be a useful reference for this kind of behavior.
Nice-to-have: It makes sense to defer the "Reindex tasks" tab to the end of the release cycle, since it's not core to the value that the Reindex Wizard provides.
Reindex Wizard UX
Step 1: Select data to reindex
The users defines the source index or indices to reindex. The user can also specify a reindex from a remote cluster and make index alias adjustments to eliminate downtime.
✂Scope-reduction opportunity: We can reduce scope by deferring "reindex from remote" functionality to a separate scope of work.
Selecting indices
In both this step and the second step, the user has the ability to specify indices. We could implement this in many ways:
Alias redirection
Nice-to-have: @jethr0null can verify for us whether this feature is valuable. We can defer this feature to the end of the release cycle as a "nice-to-have".
If there are index aliases associated with the source indices they'll be listed here and the user can select to redirect any or all of them to the destination index. This can be useful for preventing downtime during a reindex. We should surface some help text to that effect.
Suggest remote clusters
Maybe-nice-to-have: @sebelga suggested that if the user selects "Remote", then we suggest the remote clusters the user has registered for the "Host" field. This may or may not make sense depending on whether users of CCR want to both replicate from a remote cluster and reindex from it. @jethr0null will look into this.
Step 2: Configure destination index
The user defines the destination index, which can be existing or a new index.
✂Scope-reduction opportunity: We can reduce scope by deferring the "create index" option to a separate scope of work. We will only give the user the option to reindex into an existing index, so they'd have to create this index before-hand.
Overwrite destination by default
A document conflict is when a document in the destination index has the same ID as a document in a source index. By default, any conflicting documents in the destination will be overwritten by those in the source. If the destination index contains documents, we'll need to show a danger callout to warn the user of this potential consequence. We'll use this configuration to achieve this behavior:
Leveraging index templates
Index templates are a useful way to store references to mappings, settings, etc. As a future improvement, we could allow the user to browse their index templates and copy their configurations to their destination index.
Step 3: Document transformations
The user can specify an ingest node pipeline to transform the source data before it is indexed, or define conditions that prevent documents from being indexed or delete them from the destination.
A future improvement would be to allow the user to click a "Test" button to try out the transformation using the Simulate Pipeline API.
Another future improvement would be to cross-link to the ingest node pipeline builder so you can create a new pipeline and return to your reindex job without losing your work.
Step 4: Start reindex
At the last step, the user has the opportunity to review their reindex configuration, tweak the performance of the task itself, and preview the ES request(s) to create the index and reindex which will be executed under the hood. This request preview will be a useful "escape hatch" for users who need functionality not provided by the wizard -- they can still use the wizard the build these requests up, and then edit them in Console or an editor.
Once the user clicks the "Start reindexing" button they'll immediately be shown the progress of the reindex task within the context of the wizard. Setting up a reindex isn't easy and it can be gratifying to see the result of your hard work with some instant feedback. We should see if this feels rewarding and keep it if makes the UX pleasant. If it turns out to be a poor UX we can remove it and redirect the user back to the "Reindex tasks" tab in Index Management instead.
✂Scope-reduction opportunity: We can reduce scope by deferring "performance tweaking" functionality to a separate scope of work. We will just use the defaults out of the box. If the user really wants to get their hands dirty they can copy the JSON into Console.
Tweaking performance
Advanced users may want to control how the reindex task is executed. We'll need plenty of help text to explain the role of each of these options. These options are supported using the
wait_for_active_shards
,timeout
, andrequests_per_second
query parameters, as well as thesize
field on the request body.Telemetry
Per @zuketo we'll want to measure whether people are using this wizard from start to finish, or using it to build up a request which they then manually tweak and execute via Console later. It's impossible to get concrete measurements of this usage, but we can approximate this usage by tracking:
Clicking the "View task" button takes you back to the "Reindex tasks" tab with the reindex task pre-selected.
Clicking the "Stop reindexing" button will cancel the reindex. The screen will show you a "Reindex canceled" message along with the summary of the reindex configuration and the "Start reindexing" button again.
The text was updated successfully, but these errors were encountered: