Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metrics and pktparse #151

Merged
merged 12 commits into from
Aug 30, 2021
Merged

Update metrics and pktparse #151

merged 12 commits into from
Aug 30, 2021

Conversation

conorbros
Copy link
Member

The timing macro in the latest version of metrics has been replaced with histogram.

pktparse needed the feature flag to be set to serde instead of derive.

@rukai
Copy link
Member

rukai commented Aug 25, 2021

I tested the metrics viewer at http://localhost:9001/metrics
Before the upgrade:
image
After the upgrade:
image

Looks like it broke something. Maybe theres a new setup step in the new version we are missing?
Can you investigate?

@conorbros conorbros changed the title Update metrics and pktparse [WIP] Update metrics and pktparse Aug 25, 2021
@rukai
Copy link
Member

rukai commented Aug 25, 2021

Oh, I should mention that to reproduce that you will need to:

  1. startup redis: redis-passthrough$ docker-compose -f docker-compose.yml up
  2. startup shotover: shotover-proxy$ cargo run
  3. send a message to shotover: redis-cli set a blah

@benbromhead
Copy link
Member

benbromhead commented Aug 26, 2021

Good catch @rukai
@conorbros do you want to add a test for "breaking metrics" e.g. can be as simple as "start shotover, push data through, make sure the a subsequent scrape from the metrics interface returns some metrics"

@rukai
Copy link
Member

rukai commented Aug 26, 2021

Note that ShotoverManager intentionally does not initialize the metrics webserver in the hopes of one day allowing better parallelization of tests.
https://github.com/shotover/shotover-proxy/blob/main/shotover-proxy/tests/helpers/mod.rs

pub fn with_observability_interface(self) -> Result<Self> {
- we need to call this method to initialize it

I'm thinking we could change ShotoverManager::new() to take a closure that initalizes shotover and returns a JoinHandle.
This would allow us to move ShotoverManager into test-helpers and better share between benchmarks and tests.
But you dont have to take this approach, it might not work out at all!

An alternative approach is to just always initialize the observability interface

@conorbros
Copy link
Member Author

conorbros commented Aug 26, 2021

So metrics-runtime has been deprecated and is incompatible with the latest version of metrics. If we want to use metrics 17.0.x we need to use one of the existing exporters, metrics-exporter-prometheus will give us the prometheus endpoint. However the JSON exporter has been left behind in metrics-runtime.

I think we can just use the metrics-exporter-prometheus as our global recorder. Given that we definitely need to export JSON too, we could make this method public and then serialize to JSON ourselves.

Any insight before I head down this path? I have confirmed that metric-exporter-prometheus will do what we need it to, its the JSON exporting I am concerned about.

@rukai
Copy link
Member

rukai commented Aug 26, 2021

I would get the PR working without json and resolve merge conflicts, then lets wait to hear back from @benbromhead on what usecases we need to handle here.

@conorbros
Copy link
Member Author

Having some trouble getting an async test working so I have had to use a blocking HTTP library for now, will update if I get it working.

@conorbros conorbros linked an issue Aug 27, 2021 that may be closed by this pull request
shotover-proxy/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@rukai rukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few bits of feedback, then we need to wait for @benbromhead to explain the history of json here

shotover-proxy/tests/admin/observability_int_tests.rs Outdated Show resolved Hide resolved
shotover-proxy/tests/admin/observability_int_tests.rs Outdated Show resolved Hide resolved
@benbromhead
Copy link
Member

So we need the JSON exporter for Instaclustrs metrics collection agent (node-agent) which is pretty much an internal use case. Choices were made to match how we do it in other parts of our ecosystem -> #70 (comment)

It's a bit odd as prometheus / open metrics used to support a json exposition format, but that's also deprecated. So for the moment we need to support the deprecated exposition format until our node-agent can support natively the open metrics format.

@benbromhead
Copy link
Member

Also @conorbros see #141 (comment)

Some tips on easy async tests

@conorbros
Copy link
Member Author

conorbros commented Aug 29, 2021

@benbromhead re async tests:

The tokio::test attribute will start a runtime for the test. In integration tests we are initialising a Runner and we can't create a runtime (shotover's runtime) from with a runtime (test runtime). A possible solution is to make the runtime optionally configurable: pass a handle to an existing runtime to Runner and use that instead of creating one.

@rukai
Copy link
Member

rukai commented Aug 30, 2021

I was trying to understand the format of this json option.

On this page https://github.com/instaclustr/cassandra-exporter I found this:

Experimental JSON output is also provided if the Accept: application/json header or ?x-accept=application/json URL parameter is specified.
The format/structure of the JSON output is subject to change.

So seems like the json output is something non-standard instaclustr came up with?

The output seems super basic.
Its just the prometheus data put into a json map of strings to integers?

{
    "shotover_available_connections{source=\"Redis Source\"}": 511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} count": 1,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} max": 960511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} min": 960000,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} p50": 960511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} p90": 960511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} p95": 960511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} p99": 960511,
    "shotover_chain_latency{chain=\"redis_chain\",client_details=\"127.0.0.1\"} p999": 960511,
    "shotover_chain_total{chain=\"redis_chain\"}": 1,
    "shotover_transform_latency{transform=\"CodecDestination\"} count": 1,
    "shotover_transform_latency{transform=\"CodecDestination\"} max": 695295,
    "shotover_transform_latency{transform=\"CodecDestination\"} min": 694784,
    "shotover_transform_latency{transform=\"CodecDestination\"} p50": 695295,
    "shotover_transform_latency{transform=\"CodecDestination\"} p90": 695295,
    "shotover_transform_latency{transform=\"CodecDestination\"} p95": 695295,
    "shotover_transform_latency{transform=\"CodecDestination\"} p99": 695295,
    "shotover_transform_latency{transform=\"CodecDestination\"} p999": 695295,
    "shotover_transform_total{transform=\"CodecDestination\"}": 1
}

I havent looked at our implementation much but if we have access to the metrics data it should be trivial to throw it in a HashMap and export to json via serde. Or even construct it manually ourselves if constructing a hashmap is a performance concern.

@benbromhead
Copy link
Member

cc @amrutha-shanbhag @zegelin @johndelcastillo -> We are trying to keep compat for internal Instaclustr metrics

@johndelcastillo
Copy link
Contributor

Is the question: Can we get rid of it?

@benbromhead
Copy link
Member

Ah of course.

https://docs.rs/tokio/1.10.1/tokio/runtime/struct.Handle.html#method.try_current looks relevant

The tokio::test attribute will start a runtime for the test. In integration tests we are initialising a Runner and we can't create a runtime (shotover's runtime) from with a runtime (test runtime). A possible solution is to make the runtime optionally configurable: pass a handle to an existing runtime to Runner and use that instead of creating one.

@benbromhead
Copy link
Member

Is the question: Can we get rid of it?

Yes and, if no, the format it expects is the old prometheus json format, or our own weird one?

@johndelcastillo
Copy link
Contributor

lets just get rid of it and we will re-do the metrics collector

it served its purpose for pubnub but we have no active use case anyway

@conorbros
Copy link
Member Author

conorbros commented Aug 30, 2021

@johndelcastillo Just so you know we can support it, I've almost finished the changes required for that.

However it will require merging some changes into upstream metrics library so if we are able to remove it that would be best.

@johndelcastillo
Copy link
Contributor

Yeah its ok, lets not create a maintenance issue in the future

Copy link
Member

@rukai rukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Issue is fixed, code looks good!

@conorbros conorbros changed the title [WIP] Update metrics and pktparse Update metrics and pktparse Aug 30, 2021
@rukai rukai merged commit cbcfa61 into shotover:main Aug 30, 2021
@conorbros conorbros deleted the update-metrics-pktparse branch September 6, 2021 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update dependencies
5 participants