Introduce bgsave cancel #757

ranshid · 2024-07-08T13:54:54Z

In some cases bgsave child process can run for a long time exhausting system resources. Although it is possible to kill the bgsave child process from the system shell, sometimes it is not possible allowing OS level access.

This PR adds a new subcommand to the BGSAVE command.
When user will issue BGSAVE CANCEL
It will do one of the 2:

In case a bgsave child process is currently running, the child process would be immediately killed thus terminating any save/replication full sync process.
In case a bgsave child process is SCHEDULED to run, the scheduled execution will be cancelled.

In some cases bgsave child process can run for a long time exhausting system resources. Although it is possible to kill the bgsave child process from the system shell, sometimes it is not possible allowing OS level access. This bgsave extention provides an option to cancel the currently running bgsave process. Signed-off-by: ranshid <[email protected]>

Signed-off-by: ranshid <[email protected]>

codecov · 2024-07-08T14:17:01Z

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.40%. Comparing base (5f0ccf1) to head (55a1541).
Report is 223 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/rdb.c	83.33%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #757      +/-   ##
============================================
+ Coverage     70.13%   70.40%   +0.27%     
============================================
  Files           111      112       +1     
  Lines         60300    61476    +1176     
============================================
+ Hits          42292    43283     +991     
- Misses        18008    18193     +185

Files with missing lines	Coverage Δ
src/commands.def	`100.00% <ø> (ø)`
src/rdb.c	`76.47% <83.33%> (+0.48%)`	⬆️

... and 35 files with indirect coverage changes

enjoy-binbin

Seems like a good idea, LGTM.

src/rdb.c

tests/integration/rdb.tcl

Co-authored-by: Binbin <[email protected]> Signed-off-by: ranshid <[email protected]>

src/rdb.c

tests/integration/rdb.tcl

Signed-off-by: ranshid <[email protected]>

src/rdb.c

Signed-off-by: ranshid <[email protected]>

src/rdb.c

Signed-off-by: ranshid <[email protected]>

enjoy-binbin · 2024-07-16T06:35:01Z

It occurred to me that maybe we also want to cancel server.rdb_bgsave_scheduled?

Noted it will also cancel the replication child process.

Also does AOFRW need the same cancel? seem to be the same case.

ranshid · 2024-07-17T14:20:33Z

It occurred to me that maybe we also want to cancel server.rdb_bgsave_scheduled?

I also thought about that during the implementation. I think the main goal was to kill the current running bgsave. Do you think this way it is currently implemented is confusing?

enjoy-binbin · 2024-07-17T14:43:18Z

Looks like it could be a complement to SCHEDULE. I am ok with both.

Another question here, kill the current running bgsave, does it include the disk-based replication or the diskless replication intentional or unintentional?

ranshid · 2024-07-17T14:53:54Z

Looks like it could be a complement to SCHEDULE. I am ok with both.

Another question here, kill the current running bgsave, does it include the disk-based replication or the diskless replication intentional or unintentional?

Yes it should kill both brands

ranshid · 2024-07-17T15:36:34Z

I think I can add the following logic:

if only bgsave is running kill it and ALSO cancel scheduled bgsave
if no bgsave is running BUT there is a scheduled bgsave - unschedule the scheduled BGSAVE and return O.K
if no bgsave is running OR scheduled - return an error
@enjoy-binbin WDYT?

enjoy-binbin · 2024-07-17T15:56:01Z

look good to me. That's also what i thought. Let's also make sure we won't kill the replication fork? Since it is a bgsave command, i think we should let replication out or at least add a new arg for it.

ranshid · 2024-07-17T17:05:01Z

look good to me. That's also what i thought. Let's also make sure we won't kill the replication fork? Since it is a bgsave command, i think we should let replication out or at least add a new arg for it.

In AWS we encountered many cases of replication save which is taking too long and yield the system to a bad state. Sometimes the user would prefer to kill it and flush data then retry it to make the system back in good state. I do not mind adding explicit parameter to target save or replication, but I think killing the replication save is also needed

PingXie · 2024-07-17T20:52:31Z

couple of comments

Fun time again on naming :) have we considered other options? bgsave abort, bgsave stop, bgsave kill, etc? cancel seems to indicate some sort of gracefulness but based on the implementation it is just a process kill. So I was wondering if abort would be a better name?
I think @enjoy-binbin has a good point. It might not be how AWS uses it but if the goal is to "stop the bleeding" quickly, I think it makes sense in general to cancel the scheduled one too.
Can you document exactly what this action entails as part of the PR description at the top?

Since we are adding a new (sub)command, I consider this is a major decision so adding @valkey-io/core-team too

ranshid · 2024-07-18T04:14:51Z

couple of comments

Fun time again on naming :) have we considered other options? bgsave abort, bgsave stop, bgsave kill, etc? cancel seems to indicate some sort of gracefulness but based on the implementation it is just a process kill. So I was wondering if abort would be a better name?

I thought about other naming as well. IMO the kill/stop/abort is better in case we are not cancelling the scheduled save.
In case we do cancell the scheduled save maybe cancel still has a good rational? I can also make separate sub-commands for each like:
bgsave cancel - will cancel scheduled save
bgsave kill/abort - will only stop a running bgsave

I think @enjoy-binbin has a good point. It might not be how AWS uses it but if the goal is to "stop the bleeding" quickly, I think it makes sense in general to cancel the scheduled one too.

I agree. we are in full consensus here. will add it ASAP

Can you document exactly what this action entails as part of the PR description at the top?

Definitely.

Since we are adding a new (sub)command, I consider this is a major decision so adding @valkey-io/core-team too

2. introduce bgsave cancel sub command 3. introduce bgsave kill sub command 4. add test for bgsave cancel Signed-off-by: Ran Shidlansik <[email protected]>

ranshid · 2024-07-18T16:44:57Z

@enjoy-binbin @PingXie when you have the time please take a look at the last update.
Since we cannot have both arguments and subcommands the previous change actually broke 'bgsave schedule' :)
I had to change the format of bgsave so basically this is now a breaking change IMO

Signed-off-by: ranshid <[email protected]>

tests/integration/rdb.tcl

Signed-off-by: Binbin <[email protected]>

enjoy-binbin

LGTM.

src/rdb.c

Signed-off-by: Ran Shidlansik <[email protected]>

src/rdb.c

madolson · 2024-09-30T15:30:15Z

@valkey-io/core-team Please vote on adding this new argument to cancel on ongoing save. This is useful for admins.

zuiderkwast · 2024-10-21T09:57:28Z

@ranshid Do you want to prepare a doc PR, or at least create some issue for it so it's not forgotten?

src/commands/bgsave.json

ranshid · 2024-10-21T13:00:09Z

@ranshid Do you want to prepare a doc PR, or at least create some issue for it so it's not forgotten?

@zuiderkwast Sure will create a doc PR

ranshid · 2024-10-21T13:32:50Z

@ranshid Do you want to prepare a doc PR, or at least create some issue for it so it's not forgotten?

@zuiderkwast Sure will create a doc PR

@zuiderkwast - valkey-io/valkey-doc#182

Fixes wrong "since" and "history" introduced in #757. --------- Signed-off-by: Viktor Söderqvist <[email protected]>

Document bgsave cancel option added in valkey-io/valkey#757 --------- Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: ranshid <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>

commit 3c32ee1 Author: Madelyn Olson <[email protected]> Date: Mon Nov 4 12:36:20 2024 -0800 Add a filter option to drop all cluster packets (valkey-io#1252) A minor debugging change that helped in the investigation of valkey-io#1251. Basically there are some edge cases where we want to fully isolate a note from receiving packets, but can't suspend the process because we need it to continue sending outbound traffic. So, added a filter for that. Signed-off-by: Madelyn Olson <[email protected]> commit a102852 Author: Binbin <[email protected]> Date: Sat Nov 2 19:51:14 2024 +0800 Fix timing issue in cluster-shards tests (valkey-io#1243) The cluster-node-timeout is 3000 in our tests, the timing test wasn't succeeding, so extending the wait_for made them much more reliable. Signed-off-by: Binbin <[email protected]> commit 0d7b234 Author: Jim Brunner <[email protected]> Date: Fri Nov 1 15:16:18 2024 -0700 correct type internal to kvstore (minor) (valkey-io#1246) All of the internal variables related to number of dicts in the kvstore are type `int`. Not sure why these 2 items were declared as `long long`. Signed-off-by: Jim Brunner <[email protected]> commit e985ead Author: zhenwei pi <[email protected]> Date: Fri Nov 1 20:28:09 2024 +0800 RDMA: Prevent IO for child process (valkey-io#1244) RDMA MR (memory region) is not forkable, the VMA (virtual memory area) of a MR gets empty in a child process. Prevent IO for child process to avoid server crash. In the check for whether read and write is allowed in an RDMA connection, a check that if we're in a child process is added. If we are, the function returns an error, which will cause the RDMA client to be disconnected. Suggested-by: Viktor Söderqvist <[email protected]> Signed-off-by: zhenwei pi <[email protected]> commit 1c222f7 Author: Madelyn Olson <[email protected]> Date: Thu Oct 31 11:37:53 2024 -0700 Improve performance of sdssplitargs (valkey-io#1230) The current implementation of `sdssplitargs` does repeated `sdscatlen` to build the parsed arguments, which isn't very efficient because it does a lot of extra reallocations and moves through the sds code a lot. It also typically results in memory overhead, because `sdscatlen` over-allocates, which is usually not needed since args are usually not modified after being created. The new implementation of sdssplitargs does two passes, the first to parse the argument to figure out the final length and the second to actually copy the string. It's generally about 2x faster for larger strings (~100 bytes), and about 20% faster for small strings (~10 bytes). This is generally faster since as long as everything is in the CPU cache, it's going to be fast. There are a couple of sanity tests, none existed before, as well as some fuzzying which was used to find some bugs and also to do the benchmarking. The original benchmarking code can be seen valkey-io@6576aeb. ``` test_sdssplitargs_benchmark - unit/test_sds.c:530] Using random seed: 1729883235 [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.44%, new:13039us, old:29930us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.58%, new:12057us, old:27771us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 59.18%, new:9048us, old:22165us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 54.61%, new:12381us, old:27278us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 51.17%, new:16012us, old:32793us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 49.18%, new:16041us, old:31563us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.40%, new:12450us, old:29930us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.49%, new:13066us, old:30031us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.75%, new:12744us, old:30894us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 52.44%, new:16885us, old:35504us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.57%, new:8107us, old:21659us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.12%, new:8320us, old:21966us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 45.23%, new:13960us, old:25487us [test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 57.95%, new:9188us, old:21849us ``` --------- Signed-off-by: Madelyn Olson <[email protected]> commit 91cbf77 Author: Masahiro Ide <[email protected]> Date: Fri Nov 1 03:30:05 2024 +0900 Eliminate snprintf usage at setDeferredAggregateLen (valkey-io#1234) to align with how we encode the length at `_addReplyLongLongWithPrefix` Signed-off-by: Masahiro Ide <[email protected]> Co-authored-by: Masahiro Ide <[email protected]> commit ab98f37 Author: zhenwei pi <[email protected]> Date: Wed Oct 30 18:12:42 2024 +0800 RDMA: Delete keepalive timer on closing (valkey-io#1237) Typically, RDMA connection gets closed by client side, the server side handles diconnected CM event, and delete keepalive timer correctly. However, the server side may close connection voluntarily, for example the maxium connections exceed. Handle this case to avoid invalid memory access. Signed-off-by: zhenwei pi <[email protected]> commit 789a73b Author: Binbin <[email protected]> Date: Wed Oct 30 10:25:50 2024 +0800 Minor fix to debug logging in replicationFeedStreamFromPrimaryStream (valkey-io#1235) We should only print logs when hide-user-data-from-log is off. Signed-off-by: Binbin <[email protected]> commit 13f5f66 Author: Shivshankar <[email protected]> Date: Tue Oct 29 19:19:56 2024 -0400 Update the argument of clusterNodeGetReplica declaration (valkey-io#1239) clusterNodeGetReplica agrumnets are missed to migrate during the slave to replication migration so updated the argument slave to replica. Signed-off-by: Shivshankar-Reddy <[email protected]> commit 5a4c064 Author: Madelyn Olson <[email protected]> Date: Tue Oct 29 14:26:17 2024 -0700 Mark main and serverAssert as weak symbols to be overridden (valkey-io#1232) At some point unit tests stopped building on MacOS because of duplicate symbols. I had originally solved this problem by using a flag that overrides symbols, but the much better solution is to mark the duplicate symbols as weak and they can be overridden during linking. (Symbols by default are strong, strong symbols override weak symbols) I also added macos unit build to the CI, so that this doesn't silently break in the future again. --------- Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> commit 8ee7a58 Author: zixuan zhao <[email protected]> Date: Tue Oct 29 06:13:30 2024 -0400 Document log format configs in valkey.conf (valkey-io#1233) Add config options for log format and timestamp format introduced by Related to valkey-io#1225 This change adds two new configs into valkey.conf: log-format log-timestamp-format --------- Signed-off-by: azuredream <[email protected]> commit c21f1dc Author: Lipeng Zhu <[email protected]> Date: Mon Oct 28 13:43:23 2024 +0800 Increase the IO_THREADS_MAX_NUM. (valkey-io#1220) This patch try to increase the max number of io-threads from 16(128) to 256 for below reasons: 1. The core number increases a lot in the modern server processors, for example, the [Sierra Forest](https://en.wikipedia.org/wiki/Sierra_Forest) processors are targeted towards with up to **288** cores. Due to limitation of **_io-threads_** number (16 and 128 ), benchmark like https://openbenchmarking.org/test/pts/valkey even cannot run on a high core count server. 2. For some workloads, the bottleneck could be main thread, but for the other workloads, big key/value which caused heavy io, the bottleneck could be the io-threads, for example benchmark `memtier_benchmark -s 127.0.0.1 -p 9001 "--data-size" "20000" --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 50 -t 16 --hide-histogram`. The QPS is still scalable after 16 io-threads. ![image](https://github.com/user-attachments/assets/e980f805-a162-44be-b03e-ab37a9c489cf) **Fig 1. QPS Scale factor with io-threads number grows.** Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]> commit 5d2ff85 Author: Binbin <[email protected]> Date: Sun Oct 27 15:23:00 2024 +0800 Fix minor repldbfd leak in updateReplicasWaitingBgsave if fstat fails (valkey-io#1226) In the old code, if fstat fails, replica->repldbfd will hold the fd and we are doing a free client. And in freeClient, we check and close only if repl_state == REPLICA_STATE_SEND_BULK. So if fstat fails, we will leak the fd. We can also extend freeClient to handle REPLICA_STATE_WAIT_BGSAVE_END as well, but here seems to be a more friendly (and safer) way. Signed-off-by: Binbin <[email protected]> commit 4be09e4 Author: Shivshankar <[email protected]> Date: Fri Oct 25 08:03:59 2024 -0400 Fix typo in valkey.conf file's shutdown section (valkey-io#1224) Found typo "exists" ==> "exits" in valkey.conf in shutdown section. Signed-off-by: Shivshankar-Reddy <[email protected]> commit 9c60fcd Author: Lipeng Zhu <[email protected]> Date: Fri Oct 25 17:13:28 2024 +0800 Do security attack check only when command not found to reduce the critical path (valkey-io#1212) When explored the cycles distribution for main thread with io-threads enabled. We found this security attack check takes significant time in main thread, **~3%** cycles were used to do the commands security check in main thread. This patch try to completely avoid doing it in the hot path. We can do it only after we looked up the command and it wasn't found, just before we call commandCheckExistence. --------- Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]> commit 55bbbe0 Author: zixuan zhao <[email protected]> Date: Thu Oct 24 18:36:32 2024 -0400 Configurable log and timestamp formats (logfmt, ISO8601) (valkey-io#1022) Add ability to configure log output format and timestamp format in the logs. This change adds two new configs: * `log-format`: Either legacy or logfmt (See https://brandur.org/logfmt) * `log-timestamp-format`: legacy, iso8601 or milliseconds (since the eppch). Related to valkey-io#1006. Example: ``` $ ./valkey-server /home/zhaoz12/git/valkey/valkey/valkey.conf pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=warning message="WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect." pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo" pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Valkey version=255.255.255, bits=64, commit=affbea5d, modified=1, pid=109463, just started" pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Configuration loaded" pid=109463 role=master timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="monotonic clock: POSIX clock_gettime" pid=109463 role=master timestamp="2024-09-10T20:37:25.739-04:00" level=warning message="Failed to write PID file: Permission denied" ``` --------- Signed-off-by: azuredream <[email protected]> commit 2956367 Author: Binbin <[email protected]> Date: Thu Oct 24 21:53:05 2024 +0800 Maintain return value of rdbSaveDb after writing slot-info aux (valkey-io#1222) All other places written in this function are maintained it, although the caller of rdbSaveDb does not reply on it, it is maintained to be consistent with other places, is its duty. Signed-off-by: Binbin <[email protected]> commit a21fe71 Author: Binbin <[email protected]> Date: Thu Oct 24 16:38:47 2024 +0800 Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period (valkey-io#1189) If a replica is step into data_age too old stage, it can not trigger the failover and currently it can not be automatically recovered and we will print a log every CLUSTER_CANT_FAILOVER_RELOG_PERIOD, which is every second. If the primary has not recovered or there is no manual failover, this log will flood the log file. In this case, limit its frequency to 10 times period, which is 10 seconds in our code. Also in this data_age too old stage, the repeated logs also can stand for the progress of the failover. See also valkey-io#780 for more details about it. Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> commit c419524 Author: muelstefamzn <[email protected]> Date: Wed Oct 23 16:56:32 2024 -0700 Trim free space from inline command argument strings to avoid excess memory usage (valkey-io#1213) The command argument strings created while parsing inline commands (see `processInlineBuffer()`) can contain free capacity. Since some commands ,such as `SET`, store these strings in the database, that free capacity increases the memory usage. In the worst case, it could double the memory usage. This only occurs if the inline command format is used. The argument strings are built by appending character by character in `sdssplitargs()`. Regular RESP commands are not affected. This change trims the strings within `processInlineBuffer()`. this? When the command argument string is packed into an object, `trimStringObjectIfNeeded()` is called. This does only trim the string if it is larger than `PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would ever need trimming if the command it sent using the bulk string format. We could modify this condition, but that would potentially have a performance impact on commands using the bulk format. Since those make up for the vast majority of executed commands, limiting this change to inline commands seems prudent. * 1 million `SET [key] [value]` commands * Random keys (16 bytes) * 600 bytes values Memory usage without this change: ``` used_memory:1089327888 used_memory_human:1.01G used_memory_rss:1131696128 used_memory_rss_human:1.05G used_memory_peak:1089348264 used_memory_peak_human:1.01G used_memory_peak_perc:100.00% used_memory_overhead:49302800 used_memory_startup:911808 used_memory_dataset:1040025088 used_memory_dataset_perc:95.55% ``` Memory usage with this change: ``` used_memory:705327888 used_memory_human:672.65M used_memory_rss:718802944 used_memory_rss_human:685.50M used_memory_peak:705348256 used_memory_peak_human:672.67M used_memory_peak_perc:100.00% used_memory_overhead:49302800 used_memory_startup:911808 used_memory_dataset:656025088 used_memory_dataset_perc:93.13% ``` If the same experiment is repeated using the normal RESP array of bulk string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB with and without of this change. If a replica is attached, its memory usage is 672MB with and without this change, since the replication link never uses inline commands. Signed-off-by: Stefan Mueller <[email protected]> commit c176de4 Author: danish-mehmood <[email protected]> Date: Thu Oct 24 02:30:42 2024 +0500 Clarify the wording from dually to the more common doubly (valkey-io#1214) Clarify documentation is ziplist.c Signed-off-by: danish-mehmood <[email protected]> commit b803f7a Author: Binbin <[email protected]> Date: Wed Oct 23 17:11:42 2024 +0800 Cleaned up getSlotOrReply is return -1 instead of C_ERR (valkey-io#1211) Minor cleanup since getSlotOrReply return -1 on error, not return C_ERR. Signed-off-by: Binbin <[email protected]> commit 5d70ccd Author: Binbin <[email protected]> Date: Wed Oct 23 10:22:25 2024 +0800 Make replica CLUSTER RESET flush async based on lazyfree-lazy-user-flush (valkey-io#1190) Currently, if the replica has a lot of data, CLUSTER RESET will block for a while and report the slowlog, and it seems that there is no harm in making it async so external components can be easier when monitoring it. Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> commit 285064b Author: Shivshankar <[email protected]> Date: Mon Oct 21 22:54:40 2024 -0400 fix typo (valkey-io#1202) Signed-off-by: Shivshankar-Reddy <[email protected]> commit 771918e Author: Shivshankar <[email protected]> Date: Mon Oct 21 16:48:29 2024 -0400 Updating command.def by running the generate-command-code.py (valkey-io#1203) Part of valkey-io#1200 PR, since feild is changed. Looks like commands.def is missed to get genereated based on the changes so that is causing CI failure on unstable. Signed-off-by: Shivshankar-Reddy <[email protected]> commit 5885dc5 Author: Viktor Söderqvist <[email protected]> Date: Mon Oct 21 16:04:47 2024 +0200 Fix BGSAVE CANCEL since and history fields (valkey-io#1200) Fixes wrong "since" and "history" introduced in valkey-io#757. --------- Signed-off-by: Viktor Söderqvist <[email protected]> commit 29b83f1 Author: ranshid <[email protected]> Date: Mon Oct 21 12:56:44 2024 +0300 Introduce bgsave cancel (valkey-io#757) In some cases bgsave child process can run for a long time exhausting system resources. Although it is possible to kill the bgsave child process from the system shell, sometimes it is not possible allowing OS level access. This PR adds a new subcommand to the BGSAVE command. When user will issue `BGSAVE CANCEL`, it will do one of the 2: 1. In case a bgsave child process is currently running, the child process would be immediately killed thus terminating any save/replication full sync process. 2. In case a bgsave child process is SCHEDULED to run, the scheduled execution will be cancelled. --------- Signed-off-by: ranshid <[email protected]> Signed-off-by: ranshid <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> commit 71f8c34 Author: zhenwei pi <[email protected]> Date: Mon Oct 21 16:11:27 2024 +0800 RDMA: Fix listener priv opaque pointer (valkey-io#1194) struct connListener.priv should be used by connection type specific data, static local listener data should not use this. A RDMA config structure is going to be introduced in the next step: ``` typedef struct serverRdmaContextConfig { char *bindaddr; int bindaddr_count; int port; int rx_size; int comp_vector; ... } serverRdmaContextConfig; ``` Then a builtin RDMA will be supported. Signed-off-by: zhenwei pi <[email protected]> commit 2743b7e Author: Binbin <[email protected]> Date: Sat Oct 19 14:56:10 2024 +0800 Fix SORT GET to ignore special pattern # in cluster slot check (valkey-io#1182) This special pattern '#' is used to get the element itself, it does not actually participate in the slot check. In this case, passing `GET #` will cause '#' to participate in the slot check, causing the command to get an `pattern may be in different slots` error. Signed-off-by: Binbin <[email protected]> Signed-off-by: Jacob Murphy <[email protected]>

In some cases bgsave child process can run for a long time exhausting system resources. Although it is possible to kill the bgsave child process from the system shell, sometimes it is not possible allowing OS level access. This PR adds a new subcommand to the BGSAVE command. When user will issue `BGSAVE CANCEL`, it will do one of the 2: 1. In case a bgsave child process is currently running, the child process would be immediately killed thus terminating any save/replication full sync process. 2. In case a bgsave child process is SCHEDULED to run, the scheduled execution will be cancelled. --------- Signed-off-by: ranshid <[email protected]> Signed-off-by: ranshid <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>

Fixes wrong "since" and "history" introduced in valkey-io#757. --------- Signed-off-by: Viktor Söderqvist <[email protected]>

ranshid added 2 commits July 8, 2024 13:49

clang format fix

09ac835

Signed-off-by: ranshid <[email protected]>

enjoy-binbin approved these changes Jul 9, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

src/rdb.c Outdated Show resolved Hide resolved

tests/integration/rdb.tcl Outdated Show resolved Hide resolved

tests/integration/rdb.tcl Outdated Show resolved Hide resolved

ranshid and others added 4 commits July 9, 2024 09:55

Update src/rdb.c

7899fad

Co-authored-by: Binbin <[email protected]> Signed-off-by: ranshid <[email protected]>

Update src/rdb.c

0572c7a

Co-authored-by: Binbin <[email protected]> Signed-off-by: ranshid <[email protected]>

Update tests/integration/rdb.tcl

c25a85b

Co-authored-by: Binbin <[email protected]> Signed-off-by: ranshid <[email protected]>

Update tests/integration/rdb.tcl

9f7f3aa

Co-authored-by: Binbin <[email protected]> Signed-off-by: ranshid <[email protected]>

hpatro reviewed Jul 9, 2024

View reviewed changes

src/rdb.c Show resolved Hide resolved

tests/integration/rdb.tcl Show resolved Hide resolved

ranshid added 2 commits July 9, 2024 09:44

change bgsave cancel to be a sub-command

50bc3f0

Signed-off-by: ranshid <[email protected]>

fix bad merge-stash

8aff7ac

Signed-off-by: ranshid <[email protected]>

ranshid commented Jul 9, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

ranshid and others added 3 commits July 9, 2024 12:50

Update src/rdb.c

c7b71ec

Signed-off-by: ranshid <[email protected]>

some minor extra fixes

6b823e3

Signed-off-by: ranshid <[email protected]>

fix rdb.tcl after error string change

63f9529

Signed-off-by: ranshid <[email protected]>

hpatro approved these changes Jul 9, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

ranshid commented Jul 10, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

Update src/rdb.c

1878f86

Signed-off-by: ranshid <[email protected]>

PingXie added the major-decision-pending Major decision pending by TSC team label Jul 17, 2024

1. change bgsave schedule to be a sub-command

528038f

2. introduce bgsave cancel sub command 3. introduce bgsave kill sub command 4. add test for bgsave cancel Signed-off-by: Ran Shidlansik <[email protected]>

ranshid added 4 commits July 22, 2024 15:38

Update tests/integration/rdb.tcl

ae9fff6

Signed-off-by: ranshid <[email protected]>

Update tests/integration/rdb.tcl

db9353e

Signed-off-by: ranshid <[email protected]>

Update tests/integration/rdb.tcl

6c0c855

Signed-off-by: ranshid <[email protected]>

Update src/rdb.c

0576254

Signed-off-by: ranshid <[email protected]>

zuiderkwast approved these changes Jul 23, 2024

View reviewed changes

enjoy-binbin reviewed Jul 24, 2024

View reviewed changes

tests/integration/rdb.tcl Outdated Show resolved Hide resolved

Apply suggestions from code review

71533de

Signed-off-by: Binbin <[email protected]>

enjoy-binbin approved these changes Jul 24, 2024

View reviewed changes

enjoy-binbin added release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Jul 24, 2024

enjoy-binbin reviewed Jul 24, 2024

View reviewed changes

src/rdb.c Show resolved Hide resolved

add log msg for bgsave schedule

55a1541

Signed-off-by: Ran Shidlansik <[email protected]>

enjoy-binbin reviewed Jul 29, 2024

View reviewed changes

src/rdb.c Show resolved Hide resolved

soloestoy approved these changes Oct 21, 2024

View reviewed changes

zuiderkwast added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Oct 21, 2024

zuiderkwast merged commit 29b83f1 into valkey-io:unstable Oct 21, 2024
48 checks passed

soloestoy reviewed Oct 21, 2024

View reviewed changes

src/commands/bgsave.json Show resolved Hide resolved

zuiderkwast mentioned this pull request Oct 21, 2024

Fix BGSAVE CANCEL since and history fields #1200

Merged

ranshid mentioned this pull request Oct 21, 2024

Update bgsave.md for BGSAVE CANCEL valkey-io/valkey-doc#182

Merged

zuiderkwast added a commit that referenced this pull request Oct 21, 2024

Fix BGSAVE CANCEL since and history fields (#1200)

5885dc5

Fixes wrong "since" and "history" introduced in #757. --------- Signed-off-by: Viktor Söderqvist <[email protected]>

zuiderkwast removed the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Oct 21, 2024

proost pushed a commit to proost/valkey that referenced this pull request Nov 9, 2024

Fix BGSAVE CANCEL since and history fields (valkey-io#1200)

635b542

Fixes wrong "since" and "history" introduced in valkey-io#757. --------- Signed-off-by: Viktor Söderqvist <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce bgsave cancel #757

Introduce bgsave cancel #757

ranshid commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

enjoy-binbin left a comment

enjoy-binbin commented Jul 16, 2024 •

edited

Loading

ranshid commented Jul 17, 2024

enjoy-binbin commented Jul 17, 2024

ranshid commented Jul 17, 2024

ranshid commented Jul 17, 2024

enjoy-binbin commented Jul 17, 2024

ranshid commented Jul 17, 2024

PingXie commented Jul 17, 2024

ranshid commented Jul 18, 2024

ranshid commented Jul 18, 2024 •

edited

Loading

enjoy-binbin left a comment

madolson commented Sep 30, 2024

zuiderkwast commented Oct 21, 2024

ranshid commented Oct 21, 2024

ranshid commented Oct 21, 2024

Introduce bgsave cancel #757

Introduce bgsave cancel #757

Conversation

ranshid commented Jul 8, 2024 • edited Loading

codecov bot commented Jul 8, 2024 • edited Loading

Codecov Report

enjoy-binbin left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Jul 16, 2024 • edited Loading

ranshid commented Jul 17, 2024

enjoy-binbin commented Jul 17, 2024

ranshid commented Jul 17, 2024

ranshid commented Jul 17, 2024

enjoy-binbin commented Jul 17, 2024

ranshid commented Jul 17, 2024

PingXie commented Jul 17, 2024

ranshid commented Jul 18, 2024

ranshid commented Jul 18, 2024 • edited Loading

enjoy-binbin left a comment

Choose a reason for hiding this comment

madolson commented Sep 30, 2024

zuiderkwast commented Oct 21, 2024

ranshid commented Oct 21, 2024

ranshid commented Oct 21, 2024

ranshid commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

enjoy-binbin commented Jul 16, 2024 •

edited

Loading

ranshid commented Jul 18, 2024 •

edited

Loading