Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOR: downward-compatible bulkUpdate #165434

Closed
4 tasks
pgayvallet opened this issue Sep 1, 2023 · 11 comments · Fixed by #171245
Closed
4 tasks

SOR: downward-compatible bulkUpdate #165434

pgayvallet opened this issue Sep 1, 2023 · 11 comments · Fixed by #171245
Assignees
Labels
Epic:ZDTmigrations Zero downtime migrations Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Sep 1, 2023

In #152807 / #161822 we implemented the BWC update algorithm on the single-object update API.

We now need to do the same for bulkUpdate.

Ideally, the logic should be as factorized and reused between the two APIs as possible (similar to what we do with internalBulkResolve). It may imply to refactor how we implemented update in #161822 We cannot implement an internalBulkUpdate for a single object update because the bulk API doesn't support upsert. However, we should share code as far as possible between the two API's where they perform similar/identical operations (calls to get docs etc).

Tasks

Preview Give feedback
@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects Epic:ZDTmigrations Zero downtime migrations labels Sep 1, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@TinaHeiligers
Copy link
Contributor

Roger!
let's discuss after testing week.

@TinaHeiligers TinaHeiligers self-assigned this Sep 10, 2023
rudolf added a commit that referenced this issue Sep 26, 2023
## Summary

We've fixed the update limitation
(#152807), so now it only
applies to bulkUpdates #165434

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
TinaHeiligers added a commit that referenced this issue Oct 16, 2023
Part of #165434
Moves the unit tests to their own file

Co-authored-by: Kibana Machine <[email protected]>
dej611 pushed a commit to dej611/kibana that referenced this issue Oct 17, 2023
Part of elastic#165434
Moves the unit tests to their own file

Co-authored-by: Kibana Machine <[email protected]>
@TinaHeiligers
Copy link
Contributor

@pgayvallet bukkUpdate seems to only use _index. Does that mean that every saved object will be upserted if it doesn't exist when using bulkUpdate?
I'm working through the changes needed to invoke bulkUpdate for updating a single object.

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Oct 26, 2023

bukkUpdate seems to only use _index

We're using the bulk API with update operations:

bulkUpdateParams.push(
{
update: {
_id: serializer.generateRawId(getNamespaceId(objectNamespace), type, id),
_index: commonHelper.getIndexForType(type),
...versionProperties,
},
},
{
doc: {
...documentToSave,
[type]: await encryptionHelper.optionallyEncryptAttributes(
type,
id,
objectNamespace || namespace,
documentToSave[type]
),
},
}

(The bulk API is kinda particular, each operation is based on 2 parts in the payload, the first one being the metadata (operation, _id, _index...) and the second one being the actual doc the operation has to be performed on)

I'm working through the changes needed to invoke bulkUpdate for updating a single object.

Great! Please ping me if I can be of any help 🙂

@TinaHeiligers
Copy link
Contributor

We're using the bulk API with update operations

Yea, I realized later I was looking at the mget request params.

@TinaHeiligers
Copy link
Contributor

@pgayvallet I've worked on a few approaches and thought I'd double check with you before attempting yet another.

  1. Why do you suggest using bulkUpdate for updating a single object? The underlying es API may be more performant but we're only dealing with a single object. We'd also need to interpret the result from the bulk request response before responding to the caller. I'm not against it, but I'd like to hear your rationale on that approach first.
  2. I'm interpreting bulkUpdate to do the client-side object updates in memory. We'd need to restrict the API to only allow a certain payload size or number of docs to update, depending on the number of fields requested to change. Alternatively, should we consider restricting the payload size?
  3. How would we handle situations where Kibana suddenly shuts down or loses connection with ES? We might be halfway through client-side updates and lose all of it. I guess this isn't necessarily a data loss but it's not a great UX.
  4. How do we handle a situation where several updates to the same object come from different browsers? Last in first out or first in last out?

I'm sure more questions are gonna come to me but for now, this is what comes to mind.

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Nov 3, 2023

Why do you suggest using bulkUpdate for updating a single object? The underlying es API may be more performant but we're only dealing with a single object. We'd also need to interpret the result from the bulk request response before responding to the caller. I'm not against it, but I'd like to hear your rationale on that approach first.

Code factorization, mostly. I assumed the overall logic of single update and bulk update were close enough so that factorizing by having the two APIs executing the same underlying logic may be the best approach in term of maintainability.

Now, if there are things I overlooked that make this factorization somehow harder than I thought, I'm totally fine keeping the two implementations dissociated as it is the case today. If so, I'd just like to understand the reasons.

If it's "only" about the additional level of effort of refactoring update (once again), then I would also be totally fine keeping those two tasks (implementing BWC bulk update AND refactoring the BWC update to use the same code path) separate and doing the later one as a follow-up.

I'm interpreting bulkUpdate to do the client-side object updates in memory. We'd need to restrict the API to only allow a certain payload size or number of docs to update, depending on the number of fields requested to change. Alternatively, should we consider restricting the payload size?

This is a very good point, and we absolutely need to put a safeguard in place somewhere.

However, given our current usages of the bulkUpdate API, which AFAIK is only used to update a few, of dozens maximum, of documents, I don't think the risks are very high, so I think it would be acceptable to keep that as a follow-up enhancement (but please tell me if you think I'm wrong).

How would we handle situations where Kibana suddenly shuts down or loses connection with ES? We might be halfway through client-side updates and lose all of it. I guess this isn't necessarily a data loss but it's not a great UX.

As long as we're performing a single _bulk call to perform the update of all documents "atomically", then I don't think it would be any different of any other scenario where that kind of outages could occur?

Unless you're thinking of something I missed?

How do we handle a situation where several updates to the same object come from different browsers? Last in first out or first in last out?

Not sure to follow you. We don't have a "queue" of operations to perform from the repository. If multiple update calls are performed concurrently, they will be executed concurrently too.

We just need to make sure we're properly using the version properties in the update payload to protect in case of conflicts, as it is done today in bulkUpdate:

{
update: {
_id: serializer.generateRawId(getNamespaceId(objectNamespace), type, id),
_index: commonHelper.getIndexForType(type),
...versionProperties,
},
},

and as we did in update:

// using version from the source doc if not provided as option to avoid erasing changes in case of concurrent calls
...decodeRequestVersion(version || migrated!.version),

Note that you made me doubt, so I checked if the index bulk operation was supporting version fields as it does for update bulk operations, and if I trust the client typings, it's the case.

@TinaHeiligers
Copy link
Contributor

Now, if there are things I overlooked that make this factorization somehow harder than I thought, I'm totally fine keeping the two implementations dissociated as it is the case today. If so, I'd just like to understand the reasons.

  1. bulkUpdate doesn't appear to support the upsert action, which we need for update. Now I may be overlooking some hidden option somewhere but it's not obvious to me. We would need to extend the options to handle upsert for each object to support a single doc update, which should be doable.
  2. bulkUpdate also doesn't appear to check if an object is shared, at least not obviously either.

These were the initial questions I had. I'll follow up again shortly to your other responses

@pgayvallet
Copy link
Contributor Author

bulkUpdate doesn't appear to support the upsert action, which we need for update

I absolutely overlooked that, and that seems like a very good reason to not want to factorize the implementation of the two APIs

@TinaHeiligers
Copy link
Contributor

I absolutely overlooked that, and that seems like a very good reason to not want to factorize the implementation of the two APIs

It's decided then: factorize update internals as far as needing to reuse the internal helpers in bulkUpdate, rather than try to work around bulkUpdate not supporting upsert.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Nov 20, 2023

Progress update:

Based on the discussions in this issue, #171245 only required a few minor changes to ensure BWC for the SO bulkUpdate API.

The main changes are:

  • Fetch all documents, regardless of the namespace type
  • migrate to the client version
  • Update each document client-side
  • migrate updated documents to the version the server is on
  • index the updated docs

Other minor changes include type updates and adapting unit test helpers to take the additional changes into account.

Test status for these changes to bulkUpdate are as follows:

  • bulkUpdate unit tests refactored & passing. The PR includes additional tests for migrations

  • src/core/server/integration_tests/saved_objects/routes/bulk_update.test.ts passing locally

  • src/core/server/integration_tests/saved_objects/service/lib/bulk_update.test.ts added (version changes)

  • test/api_integration/apis/saved_objects/bulk_update.ts,

  • x-pack/test/saved_object_api_integration/security_and_spaces (basic & trial license configs),

  • x-pack/test/saved_object_api_integration/spaces_only

  • (TODO): Ensure Kibana runs and updates documents in all supported environments

Open questions:

  • How can we monitor how much memory is consumed by client-side updates?
  • How can we prevent (or minimize) the chance of data loss if Kibana can't index updated documents (e.g. during temporary connectivity issues or ES being unavailable)?
  • ...other unknowns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic:ZDTmigrations Zero downtime migrations Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants