Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vault): apply secret rotation error log throttling #13449

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
message: |
Applied an error log throttling on the secret rotation for each secret. Errors happened during secret rotation will only be logged once. This will prevent the log from being flooded with the same error message.
type: bugfix
scope: Core
16 changes: 15 additions & 1 deletion kong/pdk/vault.lua
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ local decode_json = cjson.decode

local NEGATIVELY_CACHED_VALUE = "\0"
local ROTATION_INTERVAL = tonumber(os.getenv("KONG_VAULT_ROTATION_INTERVAL"), 10) or 60
local SECRETS_ROTATION_ERROR_LOGGING_FLAG_PREFIX = "retry_logging:"
local DAO_MAX_TTL = constants.DATABASE.DAO_MAX_TTL


Expand Down Expand Up @@ -1238,6 +1239,10 @@ local function new(self)
-- @treturn true|nil `true` after successfully rotating a secret, otherwise `nil`
-- @treturn string|nil a string describing an error if there was one
local function rotate_secret(old_cache_key, caching_strategy)
if old_cache_key:sub(1, #SECRETS_ROTATION_ERROR_LOGGING_FLAG_PREFIX) == SECRETS_ROTATION_ERROR_LOGGING_FLAG_PREFIX then
return true
end

local reference, err = parse_cache_key(old_cache_key)
if not reference then
-- invalid cache keys are removed (in general should never happen)
Expand Down Expand Up @@ -1299,8 +1304,17 @@ local function new(self)
yield(true, phase)

local ok, err = rotate_secret(cache_key, caching_strategy)
local error_logging_flag_key = SECRETS_ROTATION_ERROR_LOGGING_FLAG_PREFIX .. cache_key
if not ok then
self.log.warn(err)
-- throttling the error logging for secrets rotation
local add_ok, _ = SECRETS_CACHE:add(error_logging_flag_key, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Is it worth using cache to decrease num of log entries?

if add_ok then
self.log.warn(err, ". To prevent log flooding, the error during rotation will not be logged again for this secret")
end

else
-- reset the error logging count
SECRETS_CACHE:delete(error_logging_flag_key)
end
end

Expand Down
4 changes: 4 additions & 0 deletions spec/02-integration/13-vaults/07-resurrect_spec.lua
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ describe("vault resurrect_ttl and rotation (#" .. strategy .. ") #" .. vault.nam
lazy_setup(function()
helpers.setenv("KONG_LUA_PATH_OVERRIDE", LUA_PATH)
helpers.setenv("KONG_VAULT_ROTATION_INTERVAL", "1")
helpers.setenv("SECRETS_ROTATION_ERROR_LOGGING_MAX_COUNT", "1")

vault:setup()
vault:create_secret(secret, "init")
Expand Down Expand Up @@ -226,8 +227,11 @@ describe("vault resurrect_ttl and rotation (#" .. strategy .. ") #" .. vault.nam
check_plugin_secret("old", 5)
vault:delete_secret(secret)
ngx.sleep(2.5)
assert.logfile().has.line(fmt([[could not retrieve value for reference .*vault.*%s\/%s.*]], vault.prefix, secret), false, 4)
helpers.clean_logfile()
check_plugin_secret("old", 5)
check_plugin_secret("", 5)
assert.logfile().has.no.line(fmt([[could not retrieve value for reference .*vault.*%s\/%s.*]], vault.prefix, secret), false, 4)
end)
end)

Expand Down
Loading