Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

源端为超多分片集群时的同步性能问题 #693

Open
AtomTu opened this issue Oct 17, 2023 · 5 comments · Fixed by #753
Open

源端为超多分片集群时的同步性能问题 #693

AtomTu opened this issue Oct 17, 2023 · 5 comments · Fixed by #753
Assignees
Labels
type: question Further information is requested

Comments

@AtomTu
Copy link

AtomTu commented Oct 17, 2023

问题描述(Issue Description)

源端74主74从,目的端3主3从

ncpu设置了64,单个rdb文件4GB,实际测试发现syncing rdb性能非常差,预计要200个小时,请问下性能瓶颈在哪里,还是需要调整什么参数

  • RedisShake 版本(RedisShake Version):4.0.2
  • Redis 源端版本(Redis Source Version):5.0.7
  • Redis 目的端版本(Redis Destination Version):5.0.7
  • Redis 部署方式(standalone/cluster/sentinel):cluster
  • 是否在云服务商实例上部署(Deployed on Cloud Provider):否

日志信息(Logs)

{"level":"info","time":"2023-10-17T01:06:52+08:00","message":"read_count=[1606902], read_ops=[2705.38], write_count=[7887], write_ops=[13.01], src-68, syncing rdb, size=[4.0 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:53+08:00","message":"read_count=[1609415], read_ops=[2513.19], write_count=[7905], write_ops=[18.00], src-69, syncing rdb, size=[4.1 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:54+08:00","message":"read_count=[1611984], read_ops=[2513.19], write_count=[7919], write_ops=[18.00], src-70, syncing rdb, size=[4.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:55+08:00","message":"read_count=[1614765], read_ops=[2781.03], write_count=[7936], write_ops=[17.00], src-71, syncing rdb, size=[4.4 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:06:56+08:00","message":"read_count=[1617324], read_ops=[2559.25], write_count=[7954], write_ops=[18.00], src-72, syncing rdb, size=[3.6 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:06:57+08:00","message":"read_count=[1619926], read_ops=[2601.42], write_count=[7967], write_ops=[13.00], src-73, syncing rdb, size=[4.9 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:58+08:00","message":"read_count=[1622464], read_ops=[2537.93], write_count=[7981], write_ops=[14.00], src-0, syncing rdb, size=[4.2 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:59+08:00","message":"read_count=[1625118], read_ops=[2654.25], write_count=[7990], write_ops=[9.00], src-1, syncing rdb, size=[5.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:00+08:00","message":"read_count=[1627592], read_ops=[2473.63], write_count=[7995], write_ops=[5.00], src-2, syncing rdb, size=[4.2 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:01+08:00","message":"read_count=[1630554], read_ops=[2962.89], write_count=[8002], write_ops=[7.00], src-3, syncing rdb, size=[4.4 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:07:02+08:00","message":"read_count=[1633064], read_ops=[2962.89], write_count=[8015], write_ops=[7.00], src-4, syncing rdb, size=[5.9 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:03+08:00","message":"read_count=[1635635], read_ops=[2571.57], write_count=[8032], write_ops=[17.00], src-5, syncing rdb, size=[4.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:04+08:00","message":"read_count=[1638241], read_ops=[2571.57], write_count=[8047], write_ops=[17.00], src-6, syncing rdb, size=[4.3 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:05+08:00","message":"read_count=[1640634], read_ops=[2391.95], write_count=[8060], write_ops=[13.00], src-7, syncing rdb, size=[4.4 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:06+08:00","message":"read_count=[1643249], read_ops=[2614.97], write_count=[8075], write_ops=[15.00], src-8, syncing rdb, size=[5.9 MiB/4.1 GiB]"}

其他信息(Additional Information)

function = """
local prefix = "mlpSummary:"
local prefix_len = #prefix
if KEYS[1] == nil then
  return
end
if KEYS[1] == "" then
  return
end
if string.sub(KEYS[1], 1, prefix_len) ~= prefix then
  return
end
shake.call(DB, ARGV)
"""

[sync_reader]
cluster = true            # set to true if source is a redis cluster
address = "100.30.6.141:6379" # when cluster is true, set address to one of the cluster node
username = ""              # keep empty if not using ACL
password = ""              # keep empty if no authentication is required
tls = false
sync_rdb = true # set to false if you don't want to sync rdb
sync_aof = true # set to false if you don't want to sync aof

# [scan_reader]
# cluster = true            # set to true if source is a redis cluster
# address = "127.0.0.1:6379" # when cluster is true, set address to one of the cluster node
# username = ""              # keep empty if not using ACL
# password = ""              # keep empty if no authentication is required
# ksn = false                # set to true to enabled Redis keyspace notifications (KSN) subscription
# tls = false

# [rdb_reader]
# filepath = "/tmp/dump.rdb"

[redis_writer]
cluster = true            # set to true if target is a redis cluster
address = "100.30.12.195:6379" # when cluster is true, set address to one of the cluster node
username = ""              # keep empty if not using ACL
password = ""              # keep empty if no authentication is required
tls = false


[advanced]
dir = "data"
ncpu = 128        # runtime.GOMAXPROCS, 0 means use runtime.NumCPU() cpu cores
pprof_port = 6479  # pprof port, 0 means disable
status_port = 6579 # status port, 0 means disable

# log
log_file = "shake.log"
log_level = "info"     # debug, info or warn
log_interval = 1       # in seconds

# redis-shake gets key and value from rdb file, and uses RESTORE command to
# create the key in target redis. Redis RESTORE will return a "Target key name
# is busy" error when key already exists. You can use this configuration item
# to change the default behavior of restore:
# panic:   redis-shake will stop when meet "Target key name is busy" error.
# rewrite: redis-shake will replace the key with new value.
# ignore:  redis-shake will skip restore the key when meet "Target key name is busy" error.
rdb_restore_command_behavior = "ignore" # panic, rewrite or skip

# redis-shake uses pipeline to improve sending performance.
# This item limits the maximum number of commands in a pipeline.
pipeline_count_limit = 40960

# Client query buffers accumulate new commands. They are limited to a fixed
# amount by default. This amount is normally 1gb.
target_redis_client_max_querybuf_len = 1024_000_000

# In the Redis protocol, bulk requests, that are, elements representing single
# strings, are normally limited to 512 mb.
target_redis_proto_max_bulk_len = 512_000_000

# If the source is Elasticache or MemoryDB, you can set this item.
aws_psync = "" # example: aws_psync = "10.0.0.1:6379@nmfu2sl5osync,10.0.0.1:6379@xhma21xfkssync"

[module]
# The data format for BF.LOADCHUNK is not compatible in different versions. v2.6.3 <=> 20603
#target_mbbloom_version = 20603
@AtomTu AtomTu added the type: question Further information is requested label Oct 17, 2023
@AtomTu
Copy link
Author

AtomTu commented Oct 17, 2023

我测试了,加上lua脚本过滤前缀和不加lua脚本,性能相差几十倍,有什么优化的方案吗

@suxb201
Copy link
Member

suxb201 commented Oct 17, 2023

这个需要优化下源端分片数过多时的性能,现在是一个协程在做这些事,有些慢。
你想临时解决下可以开 74 个 shake,每个 shake 的只负责同步源端的一个分片。

@suxb201 suxb201 changed the title 集群到集群同步性能问题 源端为大分片集群时的同步性能问题 Oct 17, 2023
@suxb201 suxb201 changed the title 源端为大分片集群时的同步性能问题 源端为超多分片集群时的同步性能问题 Oct 17, 2023
@AtomTu
Copy link
Author

AtomTu commented Oct 17, 2023

我测试了,加上lua脚本过滤前缀和不加lua脚本,性能相差几十倍,有什么优化的方案吗

请问,这个有什么优化建议吗

@suxb201
Copy link
Member

suxb201 commented Oct 17, 2023

没有,就是慢。 后面可能会优化下调用 lua 的方式。

@Zheaoli
Copy link
Collaborator

Zheaoli commented Jan 2, 2024

这里可以看下最新的 PR #753 ,Lua 相关代码的性能提升了数倍

@Zheaoli Zheaoli reopened this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants