Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During Pika pressure testing, there was a queue in the log #2882

Open
yakexi07 opened this issue Aug 26, 2024 · 7 comments
Open

During Pika pressure testing, there was a queue in the log #2882

yakexi07 opened this issue Aug 26, 2024 · 7 comments
Assignees
Labels
☢️ Bug Something isn't working

Comments

@yakexi07
Copy link

Is this a regression?

Yes

Description

使用redis-cli进行压测
time /opt/redis/redis-src/src/redis-benchmark -a xxx -p 9221 -c 100 -r 500000 -n 1000000 -d 128 -t SET,GET,HSET -l --csv ,(redis-benchmark版本为6.26)此命令会持续进行压测,压测是想得到set、get、hset命令高iops的情况下,pika是否能支持较长时间,应对线上突发大流量,大约5分钟之后,pika日志里面会出现命令排队的情况,可以持续复现

E20240826 16:06:57.641274 2084 pika_client_conn.cc:239] ip_port: 172.16.186.22:23334, db: db0, command: "HSET" "myhash" "element:000000486796" "=JkfE^5WSViF:VA1V>6fXU_jBXFe^EHMAfnRC^NN;5ak=ccRS^mW<cRegG;km;CCQQg9P9RkhR7FWUeEh:53OS9n5ed^CI930G5Km;YG3DMJOCc4VY6oY=Q;;ii7", command_size: 161, arguments: 4, total_time(ms): 30, before_queue_time(ms): 0, queue_time(ms): 21, process_time(ms): 9, cmd_time(ms): 0

偶尔排队时间会达到60ms,麻烦帮忙看下是否配置有可优化项,或者压测模型选择的有问题?

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

image

Please provide the version you discovered this bug in (check about page for version information)

测试机器:阿里云ecs.c7.2xlarge,8核心16G内存。CentOS Stream 9 64位系统,200GB ESSD PL0,iops 4200(压测的时候iops大概300左右,CPU和内存使用均正常)
pika版本:3.5.5 
pika配置如下:
port : 9221
thread-num : 6
rtc-cache-read : yes
thread-pool-size: 12
slow-cmd-pool : no
slow-cmd-thread-pool-size : 2
admin-thread-pool-size : 2
slow-cmd-list : info
admin-cmd-list : info, ping, monitor
sync-thread-num : 9
sync-binlog-thread-num : 1
log-path : ./log/
log-retention-time : 7
db-path : ./db/
write-buffer-size : 512M
arena-block-size : 64M
timeout : 0
requirepass : xxx
masterauth : xxx
userpass : xxx
userblacklist : FLUSHALL, FLUSHDB, SHUTDOWN, KEYS, CONFIG
instance-mode : classic
databases : 1
replication-num : 0
consensus-level : 0
dump-prefix : backup-
dump-path : ./dump/
dump-expire : 5
pidfile : ./pika.pid
maxclients : 20000
target-file-size-base : 20M
expire-logs-days : 7
expire-logs-nums : 20
root-connection-num : 2
slowlog-write-errorlog : yes
slowlog-log-slower-than : 30000
slowlog-max-len : 128
db-sync-path : ./dbsync/
db-sync-speed : -1
slave-priority : 100
compact-cron : 03-04/30
disable_auto_compactions : false
max-subcompactions : 3 
sync-window-size : 9000
max-conn-rbuf-size : 256M 
write-binlog : yes
binlog-file-size : 500M
max-cache-statistic-keys : 5000
small-compaction-threshold : 5000
small-compaction-duration-threshold : 10000
max-write-buffer-size : 2G
max-write-buffer-num : 3
max-client-response-size : 100M
compression : snappy
max-background-jobs : 8
max-background-flushes : 4
max-background-compactions : 4
delayed-write-rate : 0
max-compaction-bytes : 5000
max-cache-files : 50000
max-bytes-for-level-multiplier : 10
slotmigrate : no
slotmigrate-thread-num : 8
thread-migrate-keys-num : 64
block-cache: 5G 
share-block-cache: yes
default-slot-num : 128
cache-index-and-filter-blocks: yes
level-compaction-dynamic-level-bytes: yes
rate-limiter-bandwidth : 1099511627776
throttle-bytes-per-second : 207200000
rsync-timeout-ms : 1000
max-rsync-parallel-num : 4
cache-num : 16
cache-model : 1
cache-type: string, set, zset, list, hash, bit
zset-cache-field-num-per-key : 512
zset-cache-start-direction : 0
cache-maxmemory : 10G
cache-maxmemory-policy : 1
cache-maxmemory-samples: 5
cache-lfu-decay-time: 1
rename-command : FLUSHDB tiflushdb
rename-command : FLUSHALL tiflushall
internal-used-unfinished-full-sync :

Anything else?

No response

@yakexi07 yakexi07 added the ☢️ Bug Something isn't working label Aug 26, 2024
@chejinge
Copy link
Collaborator

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行
max-background-jobs : 5
max-background-flushes :2
max-background-compactions : 3

@yakexi07
Copy link
Author

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行 max-background-jobs : 5 max-background-flushes :2 max-background-compactions : 3

尝试按照你说的调整了配置,还是会出现一样的问题。
同时采用如下配置的 336版本,测试未出现问题 ,麻烦再帮忙看看 是否新版本有特殊的设置

port : 9221
thread-num : 8
thread-pool-size : 16
sync-thread-num : 2
log-path : ./log/
db-path : ./db/
write-buffer-size : 536870912 #512M
arena-block-size : 67108864 #64M
timeout : 0
requirepass : xxx
masterauth : xxx
userpass : xxx
userblacklist : FLUSHALL, FLUSHDB, SHUTDOWN, CONFIG, KEYS, slaveof, client
instance-mode : classic
databases : 8
default-slot-num : 1024
replication-num : 0
consensus-level : 0
dump-prefix : backup-
daemonize : no
dump-path : ./dump/
dump-expire : 5
pidfile : ./pika.pid
maxclients : 20000
target-file-size-base : 52428800 #50M,level 1大小
expire-logs-days : 7
expire-logs-nums : 200
root-connection-num : 2
slowlog-write-errorlog : yes
slowlog-log-slower-than : 20000
slowlog-max-len : 128
db-sync-path : ./dbsync/
db-sync-speed : -1
slave-priority : 100
compact-cron : 03-04/30
compact-interval :
sync-window-size : 9000
max-conn-rbuf-size : 268435456
write-binlog : yes
binlog-file-size : 104857600
max-cache-statistic-keys : 5000
small-compaction-threshold : 5000
max-write-buffer-size : 10737418240 #10GB
max-write-buffer-number : 2
max-client-response-size : 1073741824
compression : snappy
max-background-flushes : 1
max-background-compactions : 2
max-cache-files : 10000
max-bytes-for-level-multiplier : 10

@yakexi07
Copy link
Author

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行 max-background-jobs : 5 max-background-flushes :2 max-background-compactions : 3

355版本启动的时候,日志中有 Worker queue limit is 1766 ,这个可以说明 排队的队列最大是1766吗? 这个数值在哪儿可以修改,或者说怎么样获取的,我从你们 相关的文档

W20230921 17:13:42.295754 20438988 pika_server.cc:80] 0.0.0.0
里面看到,你们启动的时候这个数值是2万多,麻烦看看是否与这个数值有关系

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Yes, the queue is blocked. thread-pool-size: 12. You can adjust it. You have 8 cores. Just adjust it to 8. max-background-jobs: 5 max-background-flushes: 2 max-background-compactions: 3

When version 355 is started, there is Worker queue limit is 1766 in the log. Does this mean that the maximum queue is 1766? Where can this value be modified, or how to obtain it? I found it from your relevant document

W20230921 17:13:42.295754 20438988 pika_server.cc:80] 0.0.0.0
I see that when you started, the value was more than 20,000. Please check if it has anything to do with this value.

@chejinge
Copy link
Collaborator

chejinge commented Sep 2, 2024

您可以把您当前的配置在发下吗
config get *

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Can you post your current configuration?
config get*

@yakexi07
Copy link
Author

yakexi07 commented Sep 2, 2024

提供一些可供参考的场景,同样的压测条件压测了 redis 6.26 和pika 3.5.5 ,发现pika hset命令的qps比较低,同时日志里面大量的出现了排队的情况,怀疑是hset命令大量的写入导致了排队

image

E20240902 16:19:23.452486 91162 pika_client_conn.cc:239] ip_port: 172.16.26.50:26136, db: db0, command: "HSET" "gamex:0:profile:conditionRecords:45:0:d53dbccd-6053-4802-a7c4-184cb4adb3df" "23" "\n\b\b\xca\x01\x12\x03\b\xe3\x0c", command_size: 93, arguments: 4, total_time(ms): 54, before_queue_time(ms): 0, queue_time(ms): 38, process_time(ms): 16, cmd_time(ms): 0

您可以把您当前的配置在发下吗 config get *

127.0.0.1:9221> config get *

  1. "port"
  2. "9221"
  3. "thread-num"
  4. "12"
  5. "thread-pool-size"
  6. "16"
  7. "slow-cmd-thread-pool-size"
  8. "1"
  9. "admin-thread-pool-size"
  10. "2"
  11. "userblacklist"
  12. "flushall, flushdb, shutdown, keys, config"
  13. "slow-cmd-list"
  14. ""
  15. "admin-cmd-list"
  16. "monitor,ping,info"
  17. "sync-thread-num"
  18. "8"
  19. "sync-binlog-thread-num"
  20. "1"
  21. "log-path"
  22. "./log/"
  23. "db-path"
  24. "./db/"
  25. "maxmemory"
  26. "268435456"
  27. "write-buffer-size"
  28. "268435456"
  29. "arena-block-size"
  30. "33554432"
  31. "max-write-buffer-num"
  32. "2"
  33. "timeout"
  34. "0"
  35. "requirepass"
  36. "xxx"
  37. "masterauth"
  38. "xxx"
  39. "userpass"
  40. "xxx"
  41. "instance-mode"
  42. "classic"
  43. "databases"
  44. "1"
  45. "daemonize"
  46. "no"
  47. "slotmigrate"
  48. "no"
  49. "slow-cmd-pool"
  50. "no"
  51. "slotmigrate-thread-num"
  52. "8"
  53. "thread-migrate-keys-num"
  54. "64"
  55. "dump-path"
  56. "./dump/"
  57. "dump-expire"
  58. "5"
  59. "dump-prefix"
  60. "backup-"
  61. "pidfile"
  62. "./pika.pid"
  63. "maxclients"
  64. "20000"
  65. "target-file-size-base"
  66. "52428800"
  67. "max-cache-statistic-keys"
  68. "2000"
  69. "small-compaction-threshold"
  70. "5000"
  71. "small-compaction-duration-threshold"
  72. "10000"
  73. "max-background-flushes"
  74. "4"
  75. "max-background-compactions"
  76. "4"
  77. "max-background-jobs"
  78. "8"
  79. "max-cache-files"
  80. "5000"
  81. "max-bytes-for-level-multiplier"
  82. "10"
  83. "block-size"
  84. "4096"
  85. "block-cache"
  86. "15"
  87. "share-block-cache"
  88. "yes"
  89. "cache-index-and-filter-blocks"
  90. "yes"
  91. "optimize-filters-for-hits"
  92. "no"
  93. "level-compaction-dynamic-level-bytes"
  94. "yes"
  95. "expire-logs-days"
  96. "7"
  97. "expire-logs-nums"
  98. "20"
  99. "root-connection-num"
  100. "2"
  101. "slowlog-write-errorlog"
  102. "yes"
  103. "slowlog-log-slower-than"
  104. "30000"
  105. "slowlog-max-len"
  106. "128"
  107. "write-binlog"
  108. "yes"
  109. "binlog-file-size"
  110. "524288000"
  111. "max-write-buffer-size"
  112. "6442450944"
  113. "max-client-response-size"
  114. "104857600"
  115. "compression"
  116. "snappy"
  117. "db-sync-path"
  118. "./dbsync/"
  119. "db-sync-speed"
  120. "1024"
  121. "compact-cron"
  122. "03-04/30"
  123. "compact-interval"
  124. ""
  125. "disable_auto_compactions"
  126. "false"
  127. "network-interface"
  128. ""
  129. "slaveof"
  130. ""
  131. "slave-priority"
  132. "100"
  133. "save"
  134. ""
  135. "appendonly"
  136. "no"
  137. "sync-window-size"
  138. "9000"
  139. "max-conn-rbuf-size"
  140. "67108864"
  141. "replication-num"
  142. "0"
  143. "consensus-level"
  144. "0"
  145. "rate-limiter-mode"
  146. "1"
  147. "rate-limiter-bandwidth"
  148. "1099511627776"
  149. "delayed-write-rate"
  150. "0"
  151. "max-compaction-bytes"
  152. "1310720000"
  153. "rate-limiter-refill-period-us"
  154. "100000"
  155. "rate-limiter-fairness"
  156. "10"
  157. "rate-limiter-auto-tuned"
  158. "yes"
  159. "run-id"
  160. "e96f227e6214adcbec36ea0817271c802b35df85"
  161. "blob-cache"
  162. "0"
  163. "blob-compression-type"
  164. "none"
  165. "blob-file-size"
  166. "268435456"
  167. "blob-garbage-collection-age-cutoff"
  168. "0.250000"
  169. "blob-garbage-collection-force-threshold"
  170. "1.000000"
  171. "blob-num-shard-bits"
  172. "0"
  173. "compression-per-level"
  174. ""
  175. "default-slot-num"
  176. "128"
  177. "enable-blob-files"
  178. "no"
  179. "enable-blob-garbage-collection"
  180. "no"
  181. "loglevel"
  182. ""
  183. "min-blob-size"
  184. "4096"
  185. "pin_l0_filter_and_index_blocks_in_cache"
  186. "no"
  187. "slave-read-only"
  188. "yes"
  189. "throttle-bytes-per-second"
  190. "207200000"
  191. "max-rsync-parallel-num"
  192. "4"
  193. "replication-id"
  194. ""
  195. "cache-num"
  196. "16"
  197. "cache-model"
  198. "0"
  199. "cache-type"
  200. ""
  201. "zset-cache-start-direction"
  202. "0"
  203. "zset-cache-field-num-per-key"
  204. "512"
  205. "cache-maxmemory"
  206. "10737418240"
  207. "cache-maxmemory-policy"
  208. "1"
  209. "cache-maxmemory-samples"
  210. "5"
  211. "cache-lfu-decay-time"
  212. "1"
  213. "acl-pubsub-default"
  214. "resetchannels"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
☢️ Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants
@yakexi07 @chejinge @Issues-translate-bot and others