Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable smp_message_queue metrics by default #2507

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ballard26
Copy link
Contributor

These metrics were originally disabled for overloading the collectd servers many years back. However, these days they seem like a drop in the bucket relative to number of metrics seastar generates. Beyond this they would be useful to have when debugging issues related to cross-shard traffic.

These metrics were originally disabled for overloading the collectd
servers many years back. However, these days they seem like a drop in
the bucket relative to number of metrics seastar generates. Beyond this
they would be useful to have when debugging issues related to
cross-shard traffic.
@tchaikov
Copy link
Contributor

tchaikov commented Oct 19, 2024

@ballard26 hi Brandon, i think your PR actually fixes #2375 which could be controversial. because, @amnonh thinks that

Those metrics are quadratic in their nature and were disabled

IIUC, by "quadratic", it means we have n x n metrics if the seastar is configured with n cores. guess that's why they were disabled. but i think, despite that they are always disabled, these metrics are still added to the metric subsystem, even they are not added to the metrics map and are not exposed.

does it make sense to add a member to smp_options to at least allow user to enable them using smp::configure() ? so that if one suspects that the system is suffering from too many cross shard traffic/calls, they can enable this and restart the seastar application, to verify the hypothesis as long as the performance issue is reproducible.

@xemul @amnonh what do you think?

@amnonh
Copy link
Contributor

amnonh commented Oct 19, 2024

I vote against such a configuration. We already have enough options to control metrics from API and files

@tchaikov
Copy link
Contributor

tchaikov commented Oct 19, 2024

I vote against such a configuration. We already have enough options to control metrics from API and files

@amnonh hi Amnon, do you mean that we are able to utilize these options to enable a given disabled metric ?

@amnonh
Copy link
Contributor

amnonh commented Oct 20, 2024

@tchaikov yes, you can use the file or the API to to enable them, for example, I've run two nodes (docker) cluster,
I've set all metrics using the following config:

$cat relabel/conf.yaml 
relabel_configs:
  - source_labels: [__name__]
    action: keep
    regex: (.*)

Using the browser: http://172.17.0.3:9180/metrics?__name__=smp*

# HELP scylla_smp_complete_batch_queue_length Current complete batch queue length
# TYPE scylla_smp_complete_batch_queue_length gauge
scylla_smp_complete_batch_queue_length{shard="0-1"} 0.000000
scylla_smp_complete_batch_queue_length{shard="1-0"} 0.000000
# HELP scylla_smp_receive_batch_queue_length Current receive batch queue length
# TYPE scylla_smp_receive_batch_queue_length gauge
scylla_smp_receive_batch_queue_length{shard="0-1"} 0.000000
scylla_smp_receive_batch_queue_length{shard="1-0"} 0.000000
# HELP scylla_smp_send_batch_queue_length Current send batch queue length
# TYPE scylla_smp_send_batch_queue_length gauge
scylla_smp_send_batch_queue_length{shard="0-1"} 1.000000
scylla_smp_send_batch_queue_length{shard="1-0"} 1.000000
# HELP scylla_smp_send_queue_length Current send queue length
# TYPE scylla_smp_send_queue_length gauge
scylla_smp_send_queue_length{shard="0-1"} 0.000000
scylla_smp_send_queue_length{shard="1-0"} 0.000000
# HELP scylla_smp_total_completed_messages Total number of messages completed
# TYPE scylla_smp_total_completed_messages counter
scylla_smp_total_completed_messages{shard="0-1"} 3883
scylla_smp_total_completed_messages{shard="1-0"} 6889
# HELP scylla_smp_total_received_messages Total number of received messages
# TYPE scylla_smp_total_received_messages counter
scylla_smp_total_received_messages{shard="0-1"} 3883
scylla_smp_total_received_messages{shard="1-0"} 6889
# HELP scylla_smp_total_sent_messages Total number of sent messages
# TYPE scylla_smp_total_sent_messages counter
scylla_smp_total_sent_messages{shard="0-1"} 3883
scylla_smp_total_sent_messages{shard="1-0"} 6889

If you have access to the API, you can do the same in runtime using the http://localhost:10000/v2/metrics-config/ GET and POST APIs

@ballard26
Copy link
Contributor Author

ballard26 commented Oct 24, 2024

IIUC, by "quadratic", it means we have n x n metrics if the seastar is configured with n cores. guess that's why they were disabled. but i think, despite that they are always disabled, these metrics are still added to the metric subsystem, even they are not added to the metrics map and are not exposed.

Yeah, knowing this now its understandable why these metrics are not enabled by default.

Most of the time we're only ever interested in knowing the total ingress and egress of cross shard calls by shard rather than by queue. This would be a linear rather than quadratic metric relative to the number of shards and therefore something that may be more palatable to have enabled by default.

Is there any interest from you folks for having metrics like that enabled by default in seastar? If not, I believe we could achieve a similar result in our application by using set_relabel_configs(...) to split the shard="a-b" label in the existing metrics to shard="a", dst_shard="b" then set_metric_family_configs(...) to aggregate the dst_shard label.

@tchaikov
Copy link
Contributor

@tchaikov yes, you can use the file or the API to to enable them, for example, I've run two nodes (docker) cluster, I've set all metrics using the following config:
...

If you have access to the API, you can do the same in runtime using the http://localhost:10000/v2/metrics-config/ GET and POST APIs

thank you very much Amnon, didn't know that we could enable metrics at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants