fix:[动态配置]修复config_flow递归加读锁导致的死锁问题 #212
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please provide issue(s) of this PR:
Fixes #183
To help us figure out who should review this PR, please put an X in all the areas that this PR affects.
Please check any characteristics that apply to this pull request.
问题背景
程序启动时串行拉取大量配置文件,容易出现死锁,表现为永久卡在某个配置文件的
GetConfigFile
接口,在两次设置配置文件期间增加sleep退让可以缓解。类似的问题报告: #183
复现手法
伪代码
问题分析
协程1,
GetConfigFile
,获取写锁代码
polaris-go/pkg/flow/configuration/config_flow.go
Lines 105 to 125 in 435b87f
协程2,定时任务,获取两次读锁
流程:
polaris-go/pkg/flow/configuration/config_flow.go
Lines 245 to 252 in 435b87f
polaris-go/pkg/flow/configuration/config_flow.go
Lines 375 to 392 in 435b87f
polaris-go/pkg/flow/configuration/config_flow.go
Lines 400 to 408 in 435b87f
问题原因
读写锁为了防止写锁饿死,加写锁时,等待持有读锁的协程释放,且阻止新的读锁请求。
上述过程中,协程1(业务GetConfigFile)在协程2(定时任务)第一次持有读锁(步骤2)后申请写锁,此时协程2(定时任务一)第二次申请读锁(步骤3)会被阻塞,形成了循环等待,且互不退让的局面。
RWMutex的文档也有提示避免递归加读锁。https://pkg.go.dev/sync#RWMutex
修复手法
getConfigFileNotifiedVersion需要同时提供无锁和加锁版本,在外层持有读锁时避免重复加锁。新增locking参数,由调用方决定是否加锁。
这部分逻辑有更优雅的写法,但就死锁这个问题,通过加参数可以最少修改、快速修复。待未来某个里程碑再考虑调整代码结构。