Skip to content

Commit

Permalink
unstable->3.5 (#2397)
Browse files Browse the repository at this point in the history
* fix: codis-dashboard uses 100% cpu(#2332) (#2393)

Co-authored-by: liuchengyu <[email protected]>

* fix: The role displayed on the first Server in the Group area of the codis-fe is incorrect (#2350) (#2387)

Co-authored-by: liuchengyu <[email protected]>

* fix: automatic fix master-slave replication relationship after master or slave service restarted (#2373, #2038, #1950, #1967, #2351)) (#2386)

Co-authored-by: liuchengyu <[email protected]>

* feat:add 3.5.3 changelog (#2395)

* add 3.5.3 changelog
---------

Co-authored-by: chejinge <[email protected]>

---------

Co-authored-by: Chengyu Liu <[email protected]>
Co-authored-by: liuchengyu <[email protected]>
Co-authored-by: chejinge <[email protected]>
  • Loading branch information
4 people committed Feb 7, 2024
1 parent e93028f commit f61f49a
Show file tree
Hide file tree
Showing 15 changed files with 658 additions and 277 deletions.
89 changes: 89 additions & 0 deletions CHANGELOG.MD
Original file line number Diff line number Diff line change
@@ -1,3 +1,92 @@
# v3.5.3

## New features

- Pika supports ACL[#2013](https://github.com/OpenAtomFoundation/pika/pull/2013) @[lqxhub](https://github.com/lqxhub)

- Automatically resume service when Codis dashboard coroutine panics[#2349](https://github.com/OpenAtomFoundation/pika/pull/2349)@[chengyu-l](https://github.com/chengyu-l)

- During the full replication process, the slave node of the pika service does not receive read traffic requests.[#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[tedli](https://github.com/tedli)

- Pika cache adds bimap data type.[#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- Delete the remaining Slots in Sharing mode. There is only DB under Pika, and there are multiple DBs under one Pika.[#2251](https://github.com/OpenAtomFoundation/pika/pull/2251) @[Mixficsol](https://github.com/Mixficsol)

- Pika exporter exposes cache-related data collection indicators.[#2318](https://github.com/OpenAtomFoundation/pika/pull/2318) @[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- Pika supports separation of fast and slow commands.[#2162](https://github.com/OpenAtomFoundation/pika/pull/2162) @[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- After pika executes bgsave, retain the unix timepoint.[#2167](https://github.com/OpenAtomFoundation/pika/pull/2167) @[hero-heng](https://github.com/hero-heng)

- Pika supports dynamic configuration of the disable_auto_compations parameter.[#2257](https://github.com/OpenAtomFoundation/pika/pull/2257) @[hero-heng](https://github.com/hero-heng)

- Pika supports Redis Stream.[#1955](https://github.com/OpenAtomFoundation/pika/pull/1955) @[KKorpse](https://github.com/KKorpse)

- Pika supports large key analysis tools[#2195](https://github.com/OpenAtomFoundation/pika/pull/2195) @[sjcsjc123](https://github.com/sjcsjc123)

- Pika supports dynamic adjustment of Pika cache parameters[#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- Updated Pika benchmark tool to support more interface stress tests.[#2222](https://github.com/OpenAtomFoundation/pika/pull/2222)@[wangshao1](https://github.com/wangshao1)

- Pika Operator supports automatic expansion of pika clusters.[#2121](https://github.com/OpenAtomFoundation/pika/pull/2121)@[machinly](https://github.com/machinly/)

- Add the CompactRange command to support compacting keys within a certain range.[#2163](https://github.com/OpenAtomFoundation/pika/pull/2163)@[u6th9d](https://github.com/u6th9d)

- Add small time cost compaction policy.[#2172](https://github.com/OpenAtomFoundation/pika/pull/2172)@[u6th9d](https://github.com/u6th9d)

- Upgrade RocksDB version to v8.7.3.[#2157](https://github.com/OpenAtomFoundation/pika/pull/2157)@[JasirVoriya](https://github.com/JasirVoriya)

- Pika distributed cluster Codis proxy adds new observable indicators.[#2199](https://github.com/OpenAtomFoundation/pika/pull/2199)@[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- Pika distributed cluster supports automatic failover.[#2386](https://github.com/OpenAtomFoundation/pika/pull/2386)@[chengyu-l](https://github.com/chengyu-l)

## bugfix

- Fixed an issue where Pika would accidentally delete dump files during full replication from the node.[#2377](https://github.com/OpenAtomFoundation/pika/pull/2377)@[wangshao1](https://github.com/wangshao1)

- Fixed the processing logic after the slave node receives an abnormal response packet from the master during the master-slave replication process.[#2319](https://github.com/OpenAtomFoundation/pika/pull/2319)@[wangshao1](https://github.com/wangshao1)

- Call disable compaction when pika executes the shutdown command to improve the process exit speed. [#2345](https://github.com/OpenAtomFoundation/pika/pull/2345) @[panlei-coder](https://github.com/panlei-coder)

- Fix the problem of inaccurate Codis-dashboard Redis Memory value.[#2337](https://github.com/OpenAtomFoundation/pika/pull/2337) @[Mixficsol](https://github.com/Mixficsol)

- The INFO command is time-consuming and optimized to reduce the frequency of disk checks. [#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- Fixed the issue where rsync deletes temporary files with incorrect paths and fails to delete them, causing rocksdb to fail to open.[#2186](https://github.com/OpenAtomFoundation/pika/pull/2186)@[wangshao1](https://github.com/wangshao1)

- Fixed the problem that the compact, bgsave, and info keyspace commands did not specify the db name, resulting in some coredump commands.[#2194](https://github.com/OpenAtomFoundation/pika/pull/2194)@[u6th9d](https://github.com/u6th9d)

- Codis dashboard uses info replication instead of info command to search master ip to reduce the performance impact on Pika. [#2198](https://github.com/OpenAtomFoundation/pika/pull/2198) @[chenbt-hz](https://github.com/chenbt-hz)

- Fix Pika cache to use edge cases to solve the problem of cache and DB data inconsistency in some scenarios.[#2225](https://github.com/OpenAtomFoundation/pika/pull/2225) @[chejinge](https://github.com/chejinge)

- Fixed the issue where Segmentation fault would be reported when the dump folder is empty.[#2265](https://github.com/OpenAtomFoundation/pika/pull/2265) @[chenbt-hz](https://github.com/chenbt-hz)

- Fixed the problem that some command caches did not take effect due to flag calculation errors.[#2217](https://github.com/OpenAtomFoundation/pika/pull/2217) @[lqxhub](https://github.com/lqxhub)

- Fixed the problem that in master-slave replication mode, after the master instance flushdb, the slave instance cannot be accessed due to deadlock.[#2249](https://github.com/OpenAtomFoundation/pika/pull/2249)@[ForestLH](https://github.com/ForestLH)

- Fixed the issue where some commands did not judge the return value of RocksDB.[#2187](https://github.com/OpenAtomFoundation/pika/pull/2187)@[callme-taota](https://github.com/callme-taota)

- Fixed the problem that some command caches did not take effect due to flag calculation errors.[#2217](https://github.com/OpenAtomFoundation/pika/pull/2217) @[lqxhub](https://github.com/lqxhub)

- Fixed the problem that in master-slave replication mode, after the master instance flushdb, the slave instance cannot be accessed due to deadlock.[#2249](https://github.com/OpenAtomFoundation/pika/pull/2249)@[ForestLH](https://github.com/ForestLH)

- Fixed the issue where some commands did not judge the return value of RocksDB.[#2187](https://github.com/OpenAtomFoundation/pika/pull/2187)@[callme-taota](https://github.com/callme-taota)

- Fix the problem of info keyspace returning wrong results.[#2369](https://github.com/OpenAtomFoundation/pika/pull/2369)@[Mixficsol](https://github.com/Mixficsol)

- Standard function return value and initial value.[#2176](https://github.com/OpenAtomFoundation/pika/pull/2176)@[Mixficsol](https://github.com/Mixficsol)

- Fixed the problem of inaccurate network monitoring indicator statistics.[#2234](https://github.com/OpenAtomFoundation/pika/pull/2234)@[chengyu-l](https://github.com/chengyu-l)

- Fixed an issue where some parameters in configuration file loading were abnormal.[#2218](https://github.com/OpenAtomFoundation/pika/pull/2218)@[jettcc](https://github.com/jettcc)

- Fix Codis dashboard cpu used 100%.[#2393](https://github.com/OpenAtomFoundation/pika/pull/2393)@[chengyu-l](https://github.com/chengyu-l)

- Fix the problem of abnormal display of master and slave roles in Codis fe of pika.[#2387](https://github.com/OpenAtomFoundation/pika/pull/2387)@[chengyu-l](https://github.com/chengyu-l)


# v3.5.2

## New features
Expand Down
81 changes: 81 additions & 0 deletions CHANGELOG_CN.MD
Original file line number Diff line number Diff line change
@@ -1,3 +1,84 @@
# v3.5.3

## 新特性

- Pika 支持 ACL[#2013](https://github.com/OpenAtomFoundation/pika/pull/2013) @[lqxhub](https://github.com/lqxhub)

- 在 Codis dashboard 协程 panic 时自动恢复服务[#2349](https://github.com/OpenAtomFoundation/pika/pull/2349)@[chengyu-l](https://github.com/chengyu-l)

- 在全量复制的过程中,pika 服务的从节点不接收读流量请求 [#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[tedli](https://github.com/tedli)

- Pika cache 新增 bimap数据类型[#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- 删除 Sharing 模式残留的 Slot,Pika 下只有 DB,一个 Pika 下有多个 DB[#2251](https://github.com/OpenAtomFoundation/pika/pull/2251) @[Mixficsol](https://github.com/Mixficsol)

- Pika exporter 暴露 cache 相关的数据采集指标[#2318](https://github.com/OpenAtomFoundation/pika/pull/2318) @[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- Pika 支持快慢命令分离[#2162](https://github.com/OpenAtomFoundation/pika/pull/2162) @[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- pika 执行完成 Bgsave后, 保留 unix timepoint[#2167](https://github.com/OpenAtomFoundation/pika/pull/2167) @[hero-heng](https://github.com/hero-heng)

- Pika 支持动态配置 disable_auto_compations 参数[#2257](https://github.com/OpenAtomFoundation/pika/pull/2257) @[hero-heng](https://github.com/hero-heng)

- Pika 支持 Redis Stream[#1955](https://github.com/OpenAtomFoundation/pika/pull/1955) @[KKorpse](https://github.com/KKorpse)

- Pika 支持大 key 分析工具[#2195](https://github.com/OpenAtomFoundation/pika/pull/2195) @[sjcsjc123](https://github.com/sjcsjc123)

- Pika 支持动态调整 Pika cache 参数[#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- 更新 Pika benchmark 工具支持更多的接口压测[#2222](https://github.com/OpenAtomFoundation/pika/pull/2222)@[wangshao1](https://github.com/wangshao1)

- Pika Operator 支持 pika 集群自动扩容[#2121](https://github.com/OpenAtomFoundation/pika/pull/2121)@[machinly](https://github.com/machinly/)

- 添加 CompactRange 命令支持对一定范围内的 key 进行 compact[#2163](https://github.com/OpenAtomFoundation/pika/pull/2163)@[u6th9d](https://github.com/u6th9d)

- 提升 Compaction 速度减少 Compaction 耗时[#2172](https://github.com/OpenAtomFoundation/pika/pull/2172)@[u6th9d](https://github.com/u6th9d)

- 升级 RocksDB 版本到 v8.7.3[#2157](https://github.com/OpenAtomFoundation/pika/pull/2157)@[JasirVoriya](https://github.com/JasirVoriya)

- Pika 分布式集群 Codis proxy 新增可观测指标[#2199](https://github.com/OpenAtomFoundation/pika/pull/2199)@[dingxiaoshuai](https://github.com/dingxiaoshuai123)

- Pika 分布式集群支持自动 failover[#2386](https://github.com/OpenAtomFoundation/pika/pull/2386)@[chengyu-l](https://github.com/chengyu-l)

## bugfix

- 修复 Pika 有从节点进行全量复制期间会误删除 dump 文件的问题[#2377](https://github.com/OpenAtomFoundation/pika/pull/2377)@[wangshao1](https://github.com/wangshao1)

- 修复主从复制过程中, slave 节点收到 master 异常回包后的处理逻辑[#2319](https://github.com/OpenAtomFoundation/pika/pull/2319)@[wangshao1](https://github.com/wangshao1)

- 在 Pika 执行 shutdown 命令时调用 disable compaction, 提升进程退出速度 [#2345](https://github.com/OpenAtomFoundation/pika/pull/2345) @[panlei-coder](https://github.com/panlei-coder)

- 修复 Codis-dashboard Redis Memory 值不准确的问题[#2337](https://github.com/OpenAtomFoundation/pika/pull/2337) @[Mixficsol](https://github.com/Mixficsol)

- INFO 命令耗时优化,降低查磁盘频率 [#2197](https://github.com/OpenAtomFoundation/pika/pull/2197) @[chejinge](https://github.com/chejinge)

- 修复 Rsync 删除临时文件路径不对,删除失败,导致rocksdb打开失败的问题[#2186](https://github.com/OpenAtomFoundation/pika/pull/2186)@[wangshao1](https://github.com/wangshao1)

- 修复 Compact ,Bgsave ,Info keyspace 命令未指定db名称,导致部分命令 coredump 的问题[#2194](https://github.com/OpenAtomFoundation/pika/pull/2194)@[u6th9d](https://github.com/u6th9d)

- Codis dashboard 用 info replication 替代 info 命令查寻 master ip 降低对 Pika 的性能影响 [#2198](https://github.com/OpenAtomFoundation/pika/pull/2198) @[chenbt-hz](https://github.com/chenbt-hz)

- 修复 Pika cache 使用边缘case,解决部分场景下 cache 和 DB 数据不一致的问题[#2225](https://github.com/OpenAtomFoundation/pika/pull/2225) @[chejinge](https://github.com/chejinge)

- 修复当 dump 文件夹为空时,会启动报错 Segmentation fault 的问题[#2265](https://github.com/OpenAtomFoundation/pika/pull/2265) @[chenbt-hz](https://github.com/chenbt-hz)

- 修复因为flag计算错误,导致的部分命令缓存没有生效问题[#2217](https://github.com/OpenAtomFoundation/pika/pull/2217) @[lqxhub](https://github.com/lqxhub)

- 修复主从复制模式下,主实例 flushdb 后,从实例因为死锁导致的不能访问的问题[#2249](https://github.com/OpenAtomFoundation/pika/pull/2249)@[ForestLH](https://github.com/ForestLH)

- 修复部分命令未对 RocksDB 的返回值进行判断的问题[#2187](https://github.com/OpenAtomFoundation/pika/pull/2187)@[callme-taota](https://github.com/callme-taota)

- 规范函数的返回值及初始值[#2176](https://github.com/OpenAtomFoundation/pika/pull/2176)@[Mixficsol](https://github.com/Mixficsol)

- 修复网络监控指标统计不准确的问题[#2234](https://github.com/OpenAtomFoundation/pika/pull/2234)@[chengyu-l](https://github.com/chengyu-l)

- 修复配置文件加载部分参数异常的问题[#2218](https://github.com/OpenAtomFoundation/pika/pull/2218)@[jettcc](https://github.com/jettcc)

- 修复 Codis dashboard cpu 100% 的问题[#2393](https://github.com/OpenAtomFoundation/pika/pull/2393)@[chengyu-l](https://github.com/chengyu-l)

- 修复 Codis fe pika 主从角色显示异常的问题[#2387](https://github.com/OpenAtomFoundation/pika/pull/2387)@[chengyu-l](https://github.com/chengyu-l)


# v3.5.2

## 新特性
Expand Down
7 changes: 4 additions & 3 deletions codis/config/dashboard.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ migration_async_numkeys = 500
migration_timeout = "30s"

# Set configs for redis sentinel.
sentinel_check_server_state_interval = "5s"
sentinel_check_master_failover_interval = "1s"
sentinel_master_dead_check_times = 5
sentinel_check_server_state_interval = "10s"
sentinel_check_master_failover_interval = "2s"
sentinel_master_dead_check_times = 10
sentinel_check_offline_server_interval = "2s"
sentinel_client_timeout = "10s"
sentinel_quorum = 2
sentinel_parallel_syncs = 1
Expand Down
3 changes: 3 additions & 0 deletions codis/pkg/models/action.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,7 @@ const (
ActionMigrating = "migrating"
ActionFinished = "finished"
ActionSyncing = "syncing"
ActionSynced = "synced"

ActionSyncedFailed = "synced_failed"
)
45 changes: 43 additions & 2 deletions codis/pkg/models/group.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,38 @@ func (g *Group) GetServersMap() map[string]*GroupServer {
return results
}

// SelectNewMaster choose a new master node in the group
func (g *Group) SelectNewMaster() (string, int) {
var newMasterServer *GroupServer
var newMasterIndex = -1

for index, server := range g.Servers {
if index == 0 || server.State != GroupServerStateNormal {
continue
}

if newMasterServer == nil {
newMasterServer = server
newMasterIndex = index
} else if server.DbBinlogFileNum > newMasterServer.DbBinlogFileNum {
// Select the slave node with the latest offset as the master node
newMasterServer = server
newMasterIndex = index
} else if server.DbBinlogFileNum == newMasterServer.DbBinlogFileNum {
if server.DbBinlogOffset > newMasterServer.DbBinlogOffset {
newMasterServer = server
newMasterIndex = index
}
}
}

if newMasterServer == nil {
return "", newMasterIndex
}

return newMasterServer.Addr, newMasterIndex
}

type GroupServerState int8

const (
Expand All @@ -33,6 +65,13 @@ const (
GroupServerStateOffline
)

type GroupServerRole string

const (
RoleMaster GroupServerRole = "master"
RoleSlave GroupServerRole = "slave"
)

type GroupServer struct {
Addr string `json:"server"`
DataCenter string `json:"datacenter"`
Expand All @@ -43,9 +82,11 @@ type GroupServer struct {
} `json:"action"`

// master or slave
Role string `json:"role"`
Role GroupServerRole `json:"role"`
// If it is a master node, take the master_repl_offset field, otherwise take the slave_repl_offset field
ReplyOffset int `json:"reply_offset"`
DbBinlogFileNum uint64 `json:"binlog_file_num"` // db0
DbBinlogOffset uint64 `json:"binlog_offset"` // db0

// Monitoring status, 0 normal, 1 subjective offline, 2 actual offline
// If marked as 2 , no service is provided
State GroupServerState `json:"state"`
Expand Down
8 changes: 5 additions & 3 deletions codis/pkg/topom/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,10 @@ migration_async_numkeys = 500
migration_timeout = "30s"
# Set configs for redis sentinel.
sentinel_check_server_state_interval = "5s"
sentinel_check_master_failover_interval = "1s"
sentinel_master_dead_check_times = 5
sentinel_check_server_state_interval = "10s"
sentinel_check_master_failover_interval = "2s"
sentinel_master_dead_check_times = 10
sentinel_check_offline_server_interval = "2s"
sentinel_client_timeout = "10s"
sentinel_quorum = 2
sentinel_parallel_syncs = 1
Expand Down Expand Up @@ -86,6 +87,7 @@ type Config struct {
SentinelCheckServerStateInterval timesize.Duration `toml:"sentinel_check_server_state_interval" json:"sentinel_client_timeout"`
SentinelCheckMasterFailoverInterval timesize.Duration `toml:"sentinel_check_master_failover_interval" json:"sentinel_check_master_failover_interval"`
SentinelMasterDeadCheckTimes int8 `toml:"sentinel_master_dead_check_times" json:"sentinel_master_dead_check_times"`
SentinelCheckOfflineServerInterval timesize.Duration `toml:"sentinel_check_offline_server_interval" json:"sentinel_check_offline_server_interval"`
SentinelClientTimeout timesize.Duration `toml:"sentinel_client_timeout" json:"sentinel_client_timeout"`
SentinelQuorum int `toml:"sentinel_quorum" json:"sentinel_quorum"`
SentinelParallelSyncs int `toml:"sentinel_parallel_syncs" json:"sentinel_parallel_syncs"`
Expand Down
2 changes: 1 addition & 1 deletion codis/pkg/topom/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func (ctx *context) getSlotMapping(sid int) (*models.SlotMapping, error) {
}

func (ctx *context) getSlotMappingsByGroupId(gid int) []*models.SlotMapping {
var slots = []*models.SlotMapping{}
var slots []*models.SlotMapping
for _, m := range ctx.slots {
if m.GroupId == gid || m.Action.TargetId == gid {
slots = append(slots, m)
Expand Down
18 changes: 16 additions & 2 deletions codis/pkg/topom/topom.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,12 +210,12 @@ func (s *Topom) Start(routines bool) error {
}
}, nil, true, 0)

// Check the status of the pre-offline master every 1 second
// Check the status of the pre-offline master every 2 second
// to determine whether to automatically switch master and slave
gxruntime.GoUnterminated(func() {
for !s.IsClosed() {
if s.IsOnline() {
w, _ := s.CheckPreOffineMastersState(5 * time.Second)
w, _ := s.CheckPreOfflineMastersState(5 * time.Second)
if w != nil {
w.Wait()
}
Expand All @@ -224,6 +224,20 @@ func (s *Topom) Start(routines bool) error {
}
}, nil, true, 0)

// Check the status of the offline master and slave every 30 second
// to determine whether to automatically recover to right master-slave replication relationship
gxruntime.GoUnterminated(func() {
for !s.IsClosed() {
if s.IsOnline() {
w, _ := s.CheckOfflineMastersAndSlavesState(5 * time.Second)
if w != nil {
w.Wait()
}
}
time.Sleep(s.Config().SentinelCheckOfflineServerInterval.Duration())
}
}, nil, true, 0)

gxruntime.GoUnterminated(func() {
for !s.IsClosed() {
if s.IsOnline() {
Expand Down
Loading

0 comments on commit f61f49a

Please sign in to comment.