Add network-bytes-in and network-bytes-out metric support under CLUSTER SLOT-STATS command (#20) #720

kyle-yh-kim · 2024-07-01T02:11:22Z

Adds two new metrics for per-slot statistics, network-bytes-in and network-bytes-out. The network bytes are inclusive of replication bytes but exclude other types of network traffic such as clusterbus traffic.

network-bytes-in

The metric tracks network ingress bytes under per-slot context, by reverse calculation of c->argv_len_sum and c->argc, stored under a newly introduced field c->net_input_bytes_curr_cmd.

network-bytes-out

The metric tracks network egress bytes under per-slot context, by hooking onto COB buffer mutations.

sample response

Both metrics are reported under the CLUSTER SLOT-STATS command.

127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0

…alkey-io#20). The metric tracks network ingress bytes under per-slot context, by reverse calculation of c->argv_len_sum and c->argc, stored under a newly introduced field c->net_input_bytes_curr_cmd. Signed-off-by: Kyle Kim <[email protected]>

kyle-yh-kim · 2024-07-01T05:43:43Z

Technical design walk-through

There exist two system requirements; 1) Collecting network ingress bytes, and 2) Grouping / aggregating the collected network ingress bytes under per-slot granularity.

1st iteration, using `nread = read()`

Initially, we had ideated to use nread = connRead(...); within readQueryFromClient() function. This way, we directly measure the incoming bytes read off from the socket, and maintain consistency with the existing c->net_input_bytes.

The problem is that - while obtaining ingress bytes is easy, its grouping and aggregation at per-slot granularity becomes difficult. The execution of readQueryFromClient() happens as part of IO thread loop. There's no guarantee that a single read() will read off exactly the number of bytes from start to end of a single command's execution.

Here, a single command execution granularity is important, as we need to aggregate only those network ingress bytes for a specific slot.

2nd iteration, reverse calculation from `c->argv_len_sum`

Thankfully, two existing client input buffer processing functions - namely processInlineBuffer() and processMultibulkBuffer() already guarantees for a single command's execution flow.

Thus, we calculate our network bytes (inferred by c->argv_len_sum) at these functions, which is then stored under c->net_input_bytes_curr_cmd. Per-slot aggregation is deferred until c->slot is parsed later within processCommand(), where the previously calculated c->net_input_bytes_curr_cmd is referred to increment the per-slot counter.

kyle-yh-kim · 2024-07-01T19:02:26Z

tests/unit/cluster/slot-stats.tcl

+# -----------------------------------------------------------------------------
+# Test cases for CLUSTER SLOT-STATS network-bytes-in.
+# -----------------------------------------------------------------------------


Action items for integration tests

Add additional test cases, namely;

network-bytes-in for blocking commands.

network-bytes-in for transactions (MULTI/EXEC).

network-bytes-in for non-slot specific commands (ex: INFO).

network-bytes-in for pipeline using valkey-cli (valkey-cli --pipe)

network-bytes-in for sharded pub/sub.

Fix existing test cases to support the newly introduced network-bytes-in parameter.

Is this done, or is this a followup?

This is done.

Fyi, "network-bytes-in for pipeline using valkey-cli" is not explicitly tested, as it follows the same code-path as multibulk processing, for which there already exists a test case. If an explicit test case for valkey-cli --pipe is deemed necessary, I will follow-up in the next revision.

hwware · 2024-07-02T20:25:28Z

src/server.c

@@ -3869,6 +3871,9 @@ int processCommand(client *c) {
        }
    }

+    /* Now that c->slot has been parsed, accumulate the buffered network bytes-in. */
+    clusterSlotStatsAddNetworkBytesIn(c);


If we can add condition server.cluster_enabled here, if it is easier for us to understand the logic?

We definitely could, although this would incur repetitive checks, as the same condition is checked once more down its callstack;

clusterSlotStatsAddNetworkBytesIn(); | | calls | canAddNetworkBytes(); // --> checks for server.cluster_enabled

Personally, I prefer to keep the top-level call as simple as possible, and embed all conditional checks inside the function body. This way, the same function can be called in multiple locations, without having to wrap them with the same conditional checks everytime.

zuiderkwast

Looks good to me in general. Great comments.

Please include the API changes in the PR description. The command response, how it looks, by example or description.

zuiderkwast · 2024-07-02T21:09:13Z

src/cluster_slot_stats.c

+} slotStat;
+
+/* Struct used for storing slot statistics, for all slots owned by the current shard. */
+struct slotStat cluster_slot_stats[CLUSTER_SLOTS];


This global is 128KB even in standalone mode. Can we store it inside the cluster struct so it only consumes this memory in cluster mode?

Agreed. The struct has been moved under clusterState.

struct clusterState { ... slotStat slot_stats[CLUSTER_SLOTS]; }

zuiderkwast · 2024-07-02T21:10:58Z

src/cluster_slot_stats.h

+void clusterSlotStatReset(int slot);
+void clusterSlotStatsReset(void);


These names are too similar. Can we call the second one clusterSlotStatResetAll?

zuiderkwast · 2024-07-02T21:11:17Z

src/commands/cluster-slot-stats.json

@@ -35,6 +35,9 @@
                        "properties": {
                            "key-count": {
                                "type": "integer"
+                            },
+                            "memory-bytes-in": {


network, not memory

src/networking.c

madolson · 2024-07-02T22:39:24Z

Haven't circled all the way back around to this, but @kyle-yh-kim can you also document the performance characteristics so we can make the decision if it should be enabled by default or not?

kyle-yh-kim · 2024-07-03T17:45:01Z

Performance benchmarking summary

With just cpu-usec enabled, we can note a reduction of 0.28% in TPS.
I will shortly conduct the same study with both 1) cpu-uesc and network-bytes-in enabled.

	Naive	With cpu-usec	Percentage diff
p50 (ms)	2.006	2.029	1.147%
p90 (ms)	3.230	3.253	0.712%
p99 (ms)	3.850	3.866	0.414%
TPS	156052	155608	-0.285%

Appendix: Test setup

Server setup

1 server (r6g.xlarge), pre-filled with 3 million keys, 512 bytes each.

Traffic generator setup

8 traffic generators (m6g.large) running on separate ARM instances.
Each traffic generator running the following command (50 clients, SET, 512 bytes), yielding server CPU to pin at 100%.

./valkey-benchmark -h ${TARGET_IP} -c 50 -r 3000000 -n 100000000 -t set -d 514

hwware · 2024-07-03T18:14:10Z

Performance benchmarking summary

With just cpu-usec enabled, we can note a reduction of 0.28% in TPS. I will shortly conduct the same study with both 1) cpu-uesc and network-bytes-in enabled.

Naive With cpu-usec Percentage diff
p50 (ms) 2.006 2.029 1.147%
p90 (ms) 3.230 3.253 0.712%
p99 (ms) 3.850 3.866 0.414%
TPS 156052 155608 -0.285%

Appendix: Test setup

Server setup

1 server (r6g.xlarge), pre-filled with 3 million keys, 512 bytes each.

Traffic generator setup

8 traffic generators (m6g.large) running on separate ARM instances.

Each traffic generator running the following command (50 clients, SET, 512 bytes), yielding server CPU to pin at 100%.
./valkey-benchmark -h ${TARGET_IP} -c 50 -r 3000000 -n 100000000 -t set -d 514

I think you post the benchmark result in the wrong thread, maybe here #712

src/cluster_slot_stats.c

- Added config guard, "cluster-slot-stats-enabled", with default value true. - Added more test-cases. - Added network-bytes-in accumulation for replicated commands. - Added network-bytes-in accumulation for sharded pub/sub. - Added network-bytes-in accumulation for MULTI's RESP. - Moved slot_stats array under clusterState. - Fixed network-bytes-in accumulation for blocking commands. - Fixed c->argc accumulation by deferring its calculation. Signed-off-by: Kyle Kim <[email protected]>

src/db.c

src/cluster_legacy.c

src/config.c

kyle-yh-kim · 2024-07-11T01:18:47Z

Currently, the PR aggregates network-bytes-in from three different sources;

User client's incoming command.
Primary client's incoming replication stream.
Incoming message due to sharded pubsub subscription.

If this feels too confusing, there are two alternatives;

Drop the replication stream and sharded pubsub accounting altogether, and strictly track only user client’s and valkey-server’s interaction.

Split the network-bytes-in metric into three sub-components;

127.0.0.1:6379> cluster slot-stats slotsrange 12539 12539
1) 1) (integer) 12539
  2) 1) "key-count"
     2) (integer) 1
     3) "network-bytes-in-user"
     4) (integer) 10
     5) "network-bytes-in-replication"
     6) (integer) 20
     7) "network-bytes-in-sharded-pubsub"
     8) (integer) 30

Personally, I would be opposed to dropping the replication stream and sharded pubsub requirements altogether, as the inclusion of all three components better represents the actual ingress bytes. Curious to hear the core team’s thoughts on this.

madolson · 2024-07-14T17:37:13Z

I will shortly conduct the same study with both 1) cpu-uesc and network-bytes-in enabled.

Did you do this? Based on the performance of < ~1% CPU, it seems like maybe we should just leave this on without a config. We already have some optimizations on the cluster path for valkey 8, so I would expect it to be a net performance benefit to move to Valkey 8 and you also get the free observability metrics.

kyle-yh-kim · 2024-07-15T03:30:08Z

Yes, benchmarking result for both cpu-usec and network-bytes-in has been shared here; #20 (comment).

To summarize, we can expect a TPS reduction of ~0.7%. As per your recommendation, good to enable it by default without a configuration.

src/cluster_slot_stats.c

hpatro · 2024-07-16T23:54:28Z

src/cluster_slot_stats.h

+void clusterSlotStatReset(int slot);
+void clusterSlotStatResetAll(void);
+void clusterSlotStatsAddNetworkBytesIn(client *c);
+void clusterSlotStatsAddNetworkBytesInForShardedPubSub(robj *channel, robj *message);


Can't this API also account network information based on client args ?

hpatro · 2024-07-17T00:13:02Z

src/server.c

+
+    /* Now that c->slot has been parsed, and command has been executed,
+     * accumulate the buffered network bytes-in. */
+    clusterSlotStatsAddNetworkBytesIn(c);


We can maybe move this under afterCommand.

I think we would want to accumulate it on unblocking cases, so afterCommand makes sense.

Both clusterSlotStatsAddNetworkBytesInForUserClient() and clusterSlotStatsAddNetworkBytesInForUserClient() have been migrated under afterCommand().

For the case of MULTI, since afterCommand() is not reached upon queuing a command, the same call is made explicitly under queueMultiCommand() to accumulate its network ingress bytes.

src/cluster_slot_stats.c

src/networking.c

madolson · 2024-07-17T02:36:00Z

src/server.c

+
+    /* Now that c->slot has been parsed, and command has been executed,
+     * accumulate the buffered network bytes-in. */
+    clusterSlotStatsAddNetworkBytesIn(c);


I think we would want to accumulate it on unblocking cases, so afterCommand makes sense.

madolson · 2024-07-17T02:36:27Z

tests/unit/cluster/slot-stats.tcl

+# -----------------------------------------------------------------------------
+# Test cases for CLUSTER SLOT-STATS network-bytes-in.
+# -----------------------------------------------------------------------------


Is this done, or is this a followup?

src/cluster_slot_stats.c

- Updated the aggregation to be stateful. First, cluster msg length is recorded under pubsubState. Once the slot is parsed, we then accumulate the previously captured length. This way, we bypass the redundant keyHashSlot() calls. Signed-off-by: Kyle Kim <[email protected]>

codecov · 2024-07-23T19:25:18Z

Codecov Report

Attention: Patch coverage is 97.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.38%. Comparing base (5000c05) to head (0454ed4).
Report is 8 commits behind head on unstable.

Additional details and impacted files

@@            Coverage Diff             @@
##           unstable     #720    +/-   ##
==========================================
  Coverage     70.37%   70.38%            
==========================================
  Files           112      112            
  Lines         61308    61458   +150     
==========================================
+ Hits          43146    43257   +111     
- Misses        18162    18201    +39

Files	Coverage Δ
src/db.c	`88.42% <100.00%> (+0.01%)`	⬆️
src/networking.c	`88.71% <100.00%> (-0.08%)`	⬇️
src/pubsub.c	`97.19% <100.00%> (+<0.01%)`	⬆️
src/replication.c	`87.13% <100.00%> (-0.28%)`	⬇️
src/server.c	`88.56% <100.00%> (+<0.01%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/cluster_slot_stats.c	`93.82% <95.83%> (+3.50%)`	⬆️

... and 14 files with indirect coverage changes

…alkey-io#20). The metric tracks network egress bytes under per-slot context, by hooking onto COB buffer mutations. The metric can be viewed by calling the CLUSTER SLOT-STATS command, with sample response attached below; ``` 127.0.0.1:6379> cluster slot-stats slotsrange 0 0 1) 1) (integer) 0 2) 1) "key-count" 2) (integer) 0 3) "cpu-usec" 4) (integer) 0 5) "network-bytes-in" 6) (integer) 0 7) "network-bytes-out" 8) (integer) 0 ``` Signed-off-by: Kyle Kim <[email protected]>

madolson

Still looking at tests, but partial review.

src/commands/cluster-slot-stats.json

src/cluster_legacy.c

madolson · 2024-07-24T21:18:21Z

tests/unit/cluster/slot-stats.tcl

+    set key_slot [R 0 cluster keyslot $key]
+    set metrics_to_assert [list network-bytes-in]
+
+    test "CLUSTER SLOT-STATS network-bytes-in, multi bulk buffer processing." {


We also have total net-in and out metrics here, https://valkey.io/commands/client-list/. We might validate this is the same. For these cases we expect them all to be the same.

madolson · 2024-07-24T21:19:07Z

tests/unit/cluster/slot-stats.tcl

+        # *3\r\n$5\r\nblpop\r\n$3\r\nkey\r\n$1\r\n0\r\n --> 31 bytes.
+        $rd BLPOP $key 0
+        wait_for_blocked_clients_count 1
+


We can check the intermediary state here.

tests/unit/cluster/slot-stats.tcl

madolson · 2024-07-24T21:21:19Z

tests/unit/cluster/slot-stats.tcl

+    R 0 CONFIG RESETSTAT
+    R 0 FLUSHALL
+
+    test "CLUSTER SLOT-STATS network-bytes-out, for slot specific commands." {


I feel like since we merged the PRs together, a lot of these tests are testing the same thing, one for in and one for out, can we not just merge each test together? seems like it would be a lot shorter.

I've looked into this a bit more.

Ultimately, network-bytes-in and network-bytes-out are two completely separate metrics, with distinctive aggregation logics. Since their implementation does not cross paths, there still exists many test-cases that aren't mergable / doesn't make sense to merge.

For example, multi bulk buffer processing and in-line buffer processing are specific to network-bytes-in, and make no difference to network-bytes-out aggregation. Another example is MULTI/EXEC. There's no special testing for MULTI/EXEC for per-slot network-bytes-out, since the way COB is managed in a transaction / non-transaction is identical. There's also SPUBLISH, which only has a special aggregation logic for network-bytes-out, but not for network-bytes in. While these test cases could be merged, we would be forcefully merging them for the sake of merger, and not meaningfully testing their unique behaviours.

So, even with the merger effort, some test classes are better kept separate. And thus, by merging the two test classes together, we would end with three classes; 1) network-bytes-in only, 2) network-bytes-out only, and 3) both. Where the effort originally started to reduce confusion and code duplication, now becomes more confusing for developers.

For this reason, I have slight preference towards keeping them under two separate test classes. Two test classes seem simpler than three, and a failing test in "both" test class would require further triaging to assess whether the failure occurred from network-bytes-in or network-bytes-out. This ambiguity is removed when no test cases are shared between multiple metrics.

That said, this is just a slight preference, and I ultimately don't have a strong objection to swing either sides. I will commit to the final decision made by the core member.

Discussed offline with Madelyn. Given rc1 timeline on the immediate horizon, for now, we will keep the integration tests separated.

That said, after rc1 merger, we will do a fast-follow-up to address a problem with our current testing strategy; white-box testing. For example, currently the network-bytes-out does not have a MULTI/EXEC test case, since we know by its implementation that COB mutation does not differ between a transaction and a non-transaction case. Obviously, there's no guarantee that this promise will be upheld forever. Our proposed fast-follow up will disregard the implementation knowledge, and write a MULTI/EXEC test case, where all four metrics' expectations are asserted.

This way, our the composition of black-box tests and our confidence in future bug findings are improved.

We also note that not all test cases should be black-boxed, as the potential test cases are limitless. With reasonable bound, we will maintain a healthy balance between black-box / white-box tests, prioritizing black-box testing when applicable.

- Removed sharded pubsub aggregation, except for network-bytes-out internal propagation. - Added network-bytes-out in reply schema. - Added missing assertions in tcl integration tests. Signed-off-by: Kyle Kim <[email protected]>

madolson · 2024-07-26T00:28:08Z

src/cluster_slot_stats.c

+    } else if (stat_type == NETWORK_BYTES_IN) {
+        slot_stat = server.cluster->slot_stats[slot].network_bytes_in;


Suggested change

} else if (stat_type == NETWORK_BYTES_IN) {

slot_stat = server.cluster->slot_stats[slot].network_bytes_in;

} else if (stat_type == NETWORK_BYTES_IN) {

slot_stat = server.cluster->slot_stats[slot].network_bytes_in;

} else if (stat_type == NETWORK_BYTES_OUT) {

slot_stat = server.cluster->slot_stats[slot].network_bytes_out;

We might do a case switch here so that it throws a warning if we miss a type here.

madolson · 2024-07-26T00:30:00Z

tests/unit/cluster/slot-stats.tcl

@@ -511,6 +839,10 @@ start_cluster 1 0 {tags {external:skip cluster}} {
        # When cluster-slot-stats-enabled config is disabled, you cannot sort using advanced metrics.
        set orderby "cpu-usec"
        assert_error "ERR*" {R 0 CLUSTER SLOT-STATS ORDERBY $orderby}
+        set orderby "network-bytes-in"
+        assert_error "ERR*" {R 0 CLUSTER SLOT-STATS ORDERBY $orderby}
+        set orderby "network-bytes-out"


I think we're missing tests here to make sure we can order by network bytes in/out

src/cluster_slot_stats.c

src/server.c

- Bug fix on c->net_output_bytes_curr_cmd for replicationSendAck(). This value must be manually reset to zero. - Bug fix on slot stats sorting. - Added more tcl integration test cases. - Moved network-bytes-in aggregation from afterCommand() to commandProcessed() Signed-off-by: Kyle Kim <[email protected]>

src/replication.c

kyle-yh-kim · 2024-07-26T04:56:39Z

The failing test-case seems flakey. Latest commit does not alter the slot migration logic. I confirm the identical test passing on my dev-machine;

(24-07-26 4:51:20) <130> [~/workplace/valkey]
dev-dsk-kimkyle-1d-357bf1a8 % ./runtest --single unit/cluster/slot-migration
Cleanup: may take some time... OK
Starting test server at port 21079
[ready]: 2971
Testing unit/cluster/slot-migration
[ready]: 2973
[ready]: 2975
[ready]: 2979
[ready]: 2982
[ready]: 2985
[ready]: 2988
[ready]: 2991
[ready]: 2994
[ready]: 3000
[ready]: 3003
[ready]: 3006
[ready]: 2997
[ready]: 3012
[ready]: 3014
[ready]: 3009
[ok]: Slot migration states are replicated (7 ms)
[ok]: Migration target is auto-updated after failover in target shard (3254 ms)
[ok]: Migration source is auto-updated after failover in source shard (3307 ms)
[ok]: Replica redirects key access in migrating slots (1 ms)
[ok]: Replica of migrating node returns ASK redirect after READONLY (1 ms)
[ok]: Replica of migrating node returns TRYAGAIN after READONLY (1 ms)
[ok]: Replica of importing node returns TRYAGAIN after READONLY and ASKING (1 ms)
[ok]: New replica inherits migrating slot (291 ms)
[ok]: New replica inherits importing slot (300 ms)
[ok]: Empty-shard migration replicates slot importing states (6 ms)
[ok]: Empty-shard migration target is auto-updated after failover in target shard (3272 ms)
[ok]: Empty-shard migration source is auto-updated after failover in source shard (3274 ms)
[ok]: Multiple slot migration states are replicated (10 ms)
[ok]: New replica inherits multiple migrating slots (296 ms)
[ok]: Slot finalization succeeds on both primary and replicas (9 ms)
[ok]: Slot is auto-claimed by target after source relinquishes ownership (1005 ms)
[ok]: CLUSTER SETSLOT with invalid timeouts (0 ms)
[ok]: CLUSTER SETSLOT with an explicit timeout (3058 ms)
[ok]: Client blocked on XREADGROUP while stream's slot is migrated (4 ms)
[1/1 done]: unit/cluster/slot-migration (91 seconds)

                   The End

Execution time of different units:
  91 seconds - unit/cluster/slot-migration

\o/ All tests passed without errors!

Cleanup: may take some time... OK

src/cluster_slot_stats.c

Signed-off-by: Madelyn Olson <[email protected]>

madolson · 2024-07-26T17:19:02Z

src/cluster_slot_stats.c

+    default: /* SLOT_STAT_COUNT, INVALID */
+        serverPanic("Invalid slot stat type %d was found.", stat_type);


Suggested change

default: /* SLOT_STAT_COUNT, INVALID */

serverPanic("Invalid slot stat type %d was found.", stat_type);

case SLOT_STAT_COUNT:

case INVALID:

serverPanic("Invalid slot stat type %d was found.", stat_type);

This gets the benefit of if you miss a case it will throw a warning, otherwise

src/db.c

src/multi.c

- Updated minor comments. Signed-off-by: Kyle Kim <[email protected]>

madolson

LGTM I'm going to kickoff some tests and create some followup issues before merging though.

madolson · 2024-07-26T21:37:41Z

Doc PR is here: valkey-io/valkey-doc#150

madolson · 2024-07-26T21:48:48Z

Pending test run before merge: https://github.com/valkey-io/valkey/actions/runs/10118033123

Docs for CLUSTER SLOT-STATS, with key-count, cpu-usec, network-bytes-in, and network-bytes-out metrics. - valkey-io/valkey#351 - valkey-io/valkey#712 - valkey-io/valkey#720 --------- Signed-off-by: Kyle Kim <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>

kyle-yh-kim force-pushed the 20-network-in branch from 94ec40a to 018d698 Compare July 1, 2024 04:56

kyle-yh-kim mentioned this pull request Jul 1, 2024

[NEW] Introduce slot level metrics to Valkey cluster #20

Closed

kyle-yh-kim commented Jul 1, 2024

View reviewed changes

hwware reviewed Jul 2, 2024

View reviewed changes

zuiderkwast reviewed Jul 2, 2024

View reviewed changes

hwware reviewed Jul 3, 2024

View reviewed changes

src/cluster_slot_stats.c Outdated Show resolved Hide resolved

kyle-yh-kim mentioned this pull request Jul 4, 2024

Introduce commandlog heavytraffic to record big response packet. #336

Open

kyle-yh-kim force-pushed the 20-network-in branch from 123182e to b0f1009 Compare July 9, 2024 21:07

kyle-yh-kim force-pushed the 20-network-in branch from b0f1009 to 07174ee Compare July 9, 2024 21:19

kyle-yh-kim commented Jul 9, 2024

View reviewed changes

src/db.c Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/config.c Outdated Show resolved Hide resolved

kyle-yh-kim mentioned this pull request Jul 11, 2024

Add cpu-usec metric support under CLUSTER SLOT-STATS command (#20). #712

Merged

hpatro reviewed Jul 17, 2024

View reviewed changes

madolson reviewed Jul 17, 2024

View reviewed changes

kyle-yh-kim added 2 commits July 23, 2024 18:15

Merge branch 'unstable' into 20-network-in

40eb5c6

kyle-yh-kim force-pushed the 20-network-in branch 4 times, most recently from b90f537 to 5755699 Compare July 24, 2024 16:05

kyle-yh-kim mentioned this pull request Jul 24, 2024

Add network-bytes-out metric support for CLUSTER SLOT-STATS command (#20) #771

Closed

kyle-yh-kim force-pushed the 20-network-in branch from 5755699 to 8e09eb8 Compare July 24, 2024 19:04

kyle-yh-kim force-pushed the 20-network-in branch from 8e09eb8 to df9619a Compare July 24, 2024 19:11

madolson reviewed Jul 24, 2024

View reviewed changes

src/commands/cluster-slot-stats.json Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

madolson reviewed Jul 24, 2024

View reviewed changes

Minor revision.

ffe40e2

- Removed sharded pubsub aggregation, except for network-bytes-out internal propagation. - Added network-bytes-out in reply schema. - Added missing assertions in tcl integration tests. Signed-off-by: Kyle Kim <[email protected]>

madolson reviewed Jul 26, 2024

View reviewed changes

kyle-yh-kim commented Jul 26, 2024

View reviewed changes

src/replication.c Show resolved Hide resolved

madolson reviewed Jul 26, 2024

View reviewed changes

src/cluster_slot_stats.c Outdated Show resolved Hide resolved

Update src/cluster_slot_stats.c

ad22128

Signed-off-by: Madelyn Olson <[email protected]>

madolson reviewed Jul 26, 2024

View reviewed changes

src/db.c Show resolved Hide resolved

madolson reviewed Jul 26, 2024

View reviewed changes

src/multi.c Outdated Show resolved Hide resolved

Minor revision.

0454ed4

- Updated minor comments. Signed-off-by: Kyle Kim <[email protected]>

madolson approved these changes Jul 26, 2024

View reviewed changes

madolson changed the title ~~Add network-bytes-in metric support under CLUSTER SLOT-STATS command (#20)~~ Add network-bytes-in and network-bytes-out metric support under CLUSTER SLOT-STATS command (#20) Jul 26, 2024

madolson added release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Jul 26, 2024

madolson removed the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Jul 26, 2024

madolson merged commit e1d936b into valkey-io:unstable Jul 26, 2024
48 checks passed

hpatro mentioned this pull request Jul 29, 2024

Handle underflow condition of network out slot stats metric #840

Merged

kyle-yh-kim mentioned this pull request Aug 2, 2024

Add CLUSTER SLOT-STATS document. valkey-io/valkey-doc#150

Merged

		void clusterSlotStatReset(int slot);
		void clusterSlotStatsReset(void);

		} else if (stat_type == NETWORK_BYTES_IN) {
		slot_stat = server.cluster->slot_stats[slot].network_bytes_in;

		default: /* SLOT_STAT_COUNT, INVALID */
		serverPanic("Invalid slot stat type %d was found.", stat_type);

Add network-bytes-in and network-bytes-out metric support under CLUSTER SLOT-STATS command (#20) #720

Add network-bytes-in and network-bytes-out metric support under CLUSTER SLOT-STATS command (#20) #720

Conversation

kyle-yh-kim commented Jul 1, 2024 • edited by madolson Loading

network-bytes-in

network-bytes-out

sample response

kyle-yh-kim commented Jul 1, 2024

Technical design walk-through

1st iteration, using nread = read()

2nd iteration, reverse calculation from c->argv_len_sum

Choose a reason for hiding this comment

Action items for integration tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson commented Jul 2, 2024

kyle-yh-kim commented Jul 3, 2024

Performance benchmarking summary

Appendix: Test setup

hwware commented Jul 3, 2024

Performance benchmarking summary

Appendix: Test setup

kyle-yh-kim commented Jul 11, 2024

madolson commented Jul 14, 2024

kyle-yh-kim commented Jul 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 23, 2024 • edited Loading

Codecov Report

madolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyle-yh-kim commented Jul 26, 2024

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

madolson commented Jul 26, 2024

madolson commented Jul 26, 2024

kyle-yh-kim commented Jul 1, 2024 •

edited by madolson

Loading

1st iteration, using `nread = read()`

2nd iteration, reverse calculation from `c->argv_len_sum`

codecov bot commented Jul 23, 2024 •

edited

Loading