Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dynamically change klog level through socket #3146

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

xiao-jay
Copy link
Contributor

@xiao-jay xiao-jay commented Sep 29, 2023

fix #2322
docs #3111

Why not change klog level throuth change configmap

Volcano is a commercial project. Volcano is deployed in a customer cluster. The configmap cannot be modified easily, but curl can be executed. I thought of using curl to pass in the klog value that you want to modify through the scoket method.

Implementation Introduction

The vc-scheduler pod has an internal socket service to monitor changes in the klog.scok file. Because the socket service in the pod cannot be directly accessed in the k8s Node, the klog.sock file needs to be mounted from the pod to the cluster. Use The curl command modifies the klog.sock file.

This machine is a mac, use minikube to simulate the k8s cluster, enter mimikube, execute curl to modify the /tmp/socks/klog.sock file through the socket

 sudo curl --unix-socket /tmp/klog-socks/klog-klog.sock "http://localhost/setlevel?level=5&duration=60s"

image

Klog has been set to 5. This log can only be printed at level 5.
klog.V(5).Info("only klog v 5 can print this log ")
not using sudo will curl: (7) Couldn't connect to server
image

example image:xiaojie99999/vc-scheduler:1.1.7
@wangyang0616 @william-wang @Monokaix @lowang-bh

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2023
@xiao-jay xiao-jay force-pushed the dynamic-conf-socket branch 2 times, most recently from c654e7f to 58df5f7 Compare September 29, 2023 14:00
pkg/util/socket.go Fixed Show fixed Hide fixed
pkg/util/socket.go Fixed Show resolved Hide resolved
pkg/util/socket.go Fixed Show resolved Hide resolved
@xiao-jay xiao-jay force-pushed the dynamic-conf-socket branch 3 times, most recently from 2fd5105 to 7a60952 Compare September 29, 2023 14:34
pkg/util/socket.go Fixed Show fixed Hide fixed
pkg/util/socket.go Fixed Show fixed Hide fixed
pkg/util/socket.go Fixed Show fixed Hide fixed
@xiao-jay xiao-jay force-pushed the dynamic-conf-socket branch 3 times, most recently from fff5bd3 to 01c421c Compare September 29, 2023 16:10
var loglevel klog.Level
mutex.Lock()
defer mutex.Unlock()
if err := loglevel.Set(startupLogLevel); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about record an error log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion,done

// Therefore, put reset function in mutex range.
go reset(prevCtx, duration)
responseOk(&w, fmt.Sprintf("Change klog log level to %s successfully and for %v\n", currentLogLevel, duration))
mutex.RUnlock()
Copy link
Member

@lowang-bh lowang-bh Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the read lock, there is also an write lock requirement in gorutime, is that as your expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what you meant. Can you provide a more detailed scenario?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that you have locked with a readlock, and then start a goruntine in which a write lock is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wirte lock is in func modifyLoglevel,using to protect prevCtx.

func modifyLoglevel(newLogLevel string) error {
	mutex.Lock()
	defer mutex.Unlock()

	// Change klog log level to new value
	var loglevel klog.Level
	if err := loglevel.Set(newLogLevel); err != nil {
		return err
	}
	currentLogLevel = newLogLevel

	// Cancel the previous timer.
	if prevCtxCancelFunc != nil {
		prevCtxCancelFunc()
	}
	prevCtx, prevCtxCancelFunc = context.WithCancel(context.Background())
	return nil
}

Copy link
Member

@lowang-bh lowang-bh Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also write lock in the reset gorutine function. In the scenario previous timer is timeup and the select case goes into the write lock require, but at the same time you are doing modifyLoglevel and have acquired the lock. Now the modifyLoglevel runs the cacel func of previous context. What will happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You considered the scenario so carefully👍
if modifyLoglevel get the wirte lock first,modifyLoglevel will set klog and the reset will set klog after, it is abnormal.
if reset get the write lock first ,it will reset klog and the modifyLoglevel will set klog ,it will be normal.

This usage scenario is to temporarily increase the klog level during debugging without considering concurrency, so there should be no need to deal with the low-probability events you mentioned.
what do you think?
@lowang-bh

@xiao-jay xiao-jay force-pushed the dynamic-conf-socket branch 2 times, most recently from 4f18105 to 67c11c0 Compare October 7, 2023 15:20
Signed-off-by: 晓杰 <[email protected]>
@@ -0,0 +1,210 @@
package util
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the copy right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: 晓杰 <[email protected]>
Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2023
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2023
@volcano-sh-bot volcano-sh-bot merged commit 1be5aa7 into volcano-sh:master Oct 13, 2023
13 checks passed
@hwdef
Copy link
Member

hwdef commented Oct 13, 2023

The docs is not merged, and the code is merged.
This is ridiculous.

Totally disagree with this PR code.
image

@william-wang @lowang-bh

@Monokaix
Copy link
Member

Monokaix commented Oct 13, 2023

The docs is not merged, and the code is merged. This is ridiculous.

Totally disagree with this PR code. image

@william-wang @lowang-bh

As described in doc in Possible actual use scenarios part, this pr change klog level in a hot-update way, and do not need to restart volcano.

@hwdef
Copy link
Member

hwdef commented Oct 13, 2023

@Monokaix
Did you read what was in my screenshot?
Again, I'm against this approach.

@xiao-jay
Copy link
Contributor Author

@Monokaix Did you read what was in my screenshot? Again, I'm against this approach.

Sometimes we encounter error when scheduling, we need to temporarily increase the log level to see more error messages, if the restart will make it more difficult to locate the problem.So we need dynamic change klog level way.

@Monokaix
Copy link
Member

Monokaix commented Oct 13, 2023

@Monokaix Did you read what was in my screenshot? Again, I'm against this approach.

There is some misunderstanding.
When we locate a problem of volcano, usually the existing log level is not enough and the problem cannot be located, however, after increasing the log level, the problem context is lost, making it impossible to continue locating. Therefore, it is convenient to update the log level without restarting volcano to identify the problem.

@hwdef
Copy link
Member

hwdef commented Oct 13, 2023

There are many ways to do this, for example you can persist the log to a file. I don't think this pr is implemented in a good way. There is also something wrong with the code merge process.

But the code has been merged and I won't comment anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dynamic log level adjustment without restarting volcano component
6 participants