-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(migrations): split redis migrations into up and teardown #12983
Conversation
dee141d
to
dbfc85a
Compare
dbfc85a
to
b9cddbd
Compare
b9cddbd
to
b2489ca
Compare
"UP" phase should only contain non-destructive operations. "TEARDOWN" (or "FINISH") phase should be used to change/delete data. KAG-4419
b2489ca
to
77d80b7
Compare
Successfully created cherry-pick PR for |
Successfully created backport PR for |
Successfully created backport PR for |
@nowNick @locao @hanshuebner @samugi, I do want to double check if this is re-entrant or not. I feel it is not. E.g. upgrade to latest and add couple of plugins (acme, rate-limiting) and then run My concern is that what happens if this is executed multiple times: UPDATE plugins
SET config =
jsonb_set(
config,
'{storage_config,redis}',
config #> '{storage_config, redis}'
|| jsonb_build_object(
'password', config #> '{storage_config, redis, auth}',
'server_name', config #> '{storage_config, redis, ssl_server_name}',
'extra_options', jsonb_build_object(
'scan_count', config #> '{storage_config, redis, scan_count}',
'namespace', config #> '{storage_config, redis, namespace}'
)
)
)
WHERE name = 'acme'; Especially when it is executed after: UPDATE plugins
SET config =
config
#- '{storage_config,redis,auth}'
#- '{storage_config,redis,ssl_server_name}'
#- '{storage_config,redis,scan_count}'
#- '{storage_config,redis,namespace}'
WHERE name = 'acme'; (to be honest, I do not fully understand why we even migrate this, as shorthands can do it without migrations, and I guess this is good example of things starting to go wrong when trying to add migrations and why we avoid running migrations with these JSON blobs in general) |
BTW is it ok for new nodes to see the old fields in the config object? It will happen when the teardown is not done. |
Just a quick through the code, this PR follows the official guideline to split update and add operations into up and teardown respectively. However, it does not resolve the timeout issues reported here #12978. It is quite common, nowadays, customers may have hundreds of rate-limiting instances. Maybe, we should adopt @bungle 's suggestion on kong/kong/plugins/session/schema.lua Line 216 in 49aa233
|
#12761 addresses the timeout issue in the sense that it makes the environment variable
Can you elaborate how |
@bungle explained: We should generally avoid data migrations if possible as they are scary and failure prone. In this case, the new version can work with the old configration values and |
It is perfectly fine. New nodes will remove unknown fields on read (which doesn't matter much as new nodes have no code to use old fields -> but on read we sanitize the data according to schemas). And they should, with shorthands, be able to migrate old to new too (which has benefit of new (and old) nodes working correctly even when finish migrations have not ran yet). |
Yes. This is a bit debatable. As some people feel that leaving old data in db is somekind of cruft that we need to get rid of, or that will cause some other problems. The old values will stay there only as long as entity is not I also do agree that we need to get rid of this "cruft", but perhaps it is best done with major version. When we get rid of cruft like here in finish:
It is a destructive operation. Now only way to get back to older Kong version is to restore db. If we didn't do this at all, it would be just matter of switching version back. And it would work even if new fields already had data copied over, because again, old nodes remove unknown fields on read. We could also add a new command like (to remove the old fields, essentially we can just loop through entities and call
(I agree, this fights against "let's keep everything clean" but it is quite practical - aka enables more ways to recover from things going wrong, and frankly I have hardly ever heard "the cruft" in data causing us issues - so just thinking about benefits of destructive operations in minor versions, and even in majors, we used to even do these destructive things separated in two versions, aka in steps) |
Migrations can go wrong in many unexpected ways. Especially data migrations. We have seen timeouts, locks harming production machines, processed get killed in a mid, network issues, postgres version issues (migration works on my machine, but not on customer version of pg), we have seen that we needed to rewrite migrations because of bugs and then we needed to somehow support also people who ran the buggy migration if possible etc. The biggest issue for us is that we don't know the data. It can be anything from couple of entities to perhaps millions of entities. |
Summary
"UP" phase should only contain non-destructive operations. "TEARDOWN" (or "FINISH") phase should be used to change/delete data.
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.mdThere is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HEREIssue reference
KAG-4419
#12978