-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from okhttp to armeria #2653
Migrate from okhttp to armeria #2653
Conversation
this is the second time you've written elasticsearch support now. this time
you dont need it yourself! many thanks for the help
|
ps I plan to dig deeper into this over the weekend because trying to finish
javascript and it is taking all my willpower to resist other things ;)
…On Mon, Jul 8, 2019, 5:58 PM Adrian Cole ***@***.***> wrote:
this is the second time you've written elasticsearch support now. this
time you dont need it yourself! many thanks for the help
|
Updated Also for the record, want to say it was quite painful updating the |
appreciate the feedback rag. it isnt an irreversible thing (kotlin tests)
though I am not sure if intellij will help the other direction or not. in
hindsight I think more may have thought it was a nice idea in abstract than
were directly impacted in a good or bad way. you are the first to feel
impact and while it might be water under the bridge.. reversing is an
option.
…On Tue, Jul 9, 2019, 6:43 PM Anuraag Agrawal ***@***.***> wrote:
Updated zipkin-server too and this is getting better, but one WIP is
setting connect timeout, which isn't too hard but will bring back the
close method on the storage which I thought wasn't needed by using
Armeria's default clientfactory.
Also for the record, want to say it was quite painful updating the
zipkin-server tests due to Kotlin. Not only is it unfamiliar and a big
context switch, but I find it has quite strange syntax and quirks when
integrating with normal Java libraries - for the life of me I couldn't get assertThat(Throwable).satisfies
{ checks } to even compile. So another post-humous vote against the
Kotlin tests ;)
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#2653?email_source=notifications&email_token=AAAPVV7PNDPWZ4ZBQIXAXLLP6RTVZA5CNFSM4H6ZPMGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZP4ACI#issuecomment-509591561>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVV5PSJNHPURV73VROB3P6RTVZANCNFSM4H6ZPMGA>
.
|
V6 integration tests are working fine now, for some reason the armeria client can't connect to V7. The build passing is deceptive, since the V7 tests got skipped due to the connection failure will look more. |
Ok V7 should be all good too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the hard yards here.
Main last mile will be making sure zipkin-aws elastcisearch is possible to refactor over the work here
https://github.com/openzipkin/zipkin-aws/tree/master/autoconfigure/storage-elasticsearch-aws/src/main/java/zipkin/autoconfigure/storage/elasticsearch/aws
.../java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageAutoConfiguration.java
Show resolved
Hide resolved
.../java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageAutoConfiguration.java
Show resolved
Hide resolved
...ver/src/test/kotlin/zipkin2/elasticsearch/ZipkinElasticsearchStorageAutoConfigurationTest.kt
Show resolved
Hide resolved
var client: OkHttpClient = OkHttpClient.Builder() | ||
.addNetworkInterceptor(BasicAuthInterceptor(ZipkinElasticsearchStorageProperties(false, 0))) | ||
.build() | ||
companion object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry.. this looks like it probably wasn't fun to figure out
} | ||
|
||
@Override | ||
public void close() { | ||
if (!shutdownClientOnClose()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional to remove the shutdownClientOnClose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah without customization, we use the default ClientFactory
, and it's a no-op to close it so no need for us to worry.
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java
Show resolved
Hide resolved
* This returns a Dns provider that combines the IPv4 or IPv6 addresses from a supplied list of | ||
* urls, provided they are all http and share the same port. | ||
*/ | ||
final class PseudoAddressRecordSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow forgot how much code this was..
@@ -13,17 +13,20 @@ | |||
*/ | |||
package zipkin2.elasticsearch.internal.client; | |||
|
|||
import java.io.Closeable; | |||
import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delay jackson to a later pull request with some of the okio utilities?
ex does one or the other here help?
https://github.com/square/okio/pull/346/files
https://github.com/square/okio/blob/d7fcee8db94177ac4d02e205f928d8425f352d52/okio/src/jvmTest/java/okio/NioTest.java
if not, add a TODO: to follow-up to swap out okio/moshi as we shouldn't really have multiple things doing the same if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - fixed the bug in ReadBuffer
and used it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First Kotlin weirdness, now weirdest Armeria issue ever line/armeria#1895
I've learned my lesson to not call issues straightforward without care ;)
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public void close() { | ||
if (!shutdownClientOnClose()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah without customization, we use the default ClientFactory
, and it's a no-op to close it so no need for us to worry.
@@ -13,17 +13,20 @@ | |||
*/ | |||
package zipkin2.elasticsearch.internal.client; | |||
|
|||
import java.io.Closeable; | |||
import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - fixed the bug in ReadBuffer
and used it
build is green apart from being too long which made travis kill the job |
Thanks for all the work on this, Rag. I started ES 6.8 then ran zipkin off this branch like so: $ curl -sSL https://jitpack.io/com/github/anuraaga/zipkin/zipkin-server/elasticsearch-armeria-SNAPSHOT/zipkin-server-elasticsearch-armeria-SNAPSHOT-exec.jar > zipkin.jar
$ STORAGE_TYPE=elasticsearch ES_HTTP_LOGGING=BODY java -jar zipkin.jar I then hit the health check endpoint like this: $ curl -s localhost:9411/health The health check hasn't returned a value (hung). I looked at the server logs and see the requests made (though it isn't unrolling the gzipped response into the console which iirc the old code did)
|
I added |
sorry will check! |
Thanks :) FWIW, ran a simple stress test via
|
the problem was that I looked at stats and 100% of the bulk requests failed. I'll lock in the commit and try again though. |
ok I rebuilt using the commit tag --- a/zipkin/Dockerfile
+++ b/zipkin/Dockerfile
@@ -16,7 +16,7 @@ FROM alpine
ENV USER anuraaga
ENV BRANCH elasticsearch-armeria
-ENV VERSION SNAPSHOT
+ENV VERSION 76263f9bd0-1
# Add environment settings for supported storage types
COPY zipkin/ /zipkin/
from here https://jitpack.io/com/github/anuraaga/zipkin/elasticsearch-armeria-76263f9bd0-1/build.log I'll do the test again now.. |
definitely I had an old version before.. maybe because I didn't do I'm sure you are right by the way, they were okhttp logs! Definitely the logs look different
|
so basically it is still dropping almost every span before and after I guess.. |
$ curl -s localhost:9411/metrics|jq .
{
"counter.zipkin_collector.messages.http": 46,
"counter.zipkin_collector.spans_dropped.http": 253822,
"gauge.zipkin_collector.message_bytes.http": 1857454,
"counter.zipkin_collector.bytes.http": 92070197,
"gauge.zipkin_collector.message_spans.http": 5877,
"counter.zipkin_collector.spans.http": 258104,
"counter.zipkin_collector.messages_dropped.http": 0
} |
I'm running against last release to check also.. |
to make docker not fall over, I needed to change the sender configuration to send less data at a time. This is before and after the change. I just got a clean run of before.. will do one of the after now. diff --git a/webmvc4/pom.xml b/webmvc4/pom.xml
index af9f420..79d2a2c 100755
--- a/webmvc4/pom.xml
+++ b/webmvc4/pom.xml
@@ -18,7 +18,7 @@
<spring.version>4.3.24.RELEASE</spring.version>
<log4j.version>2.11.2</log4j.version>
- <brave.version>5.6.6</brave.version>
+ <brave.version>5.6.7</brave.version>
</properties>
<dependencyManagement>
diff --git a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
index 095c043..38722b2 100644
--- a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -20,6 +20,7 @@ import org.springframework.context.annotation.Import;
import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;
import zipkin2.Span;
+import zipkin2.codec.Encoding;
import zipkin2.reporter.AsyncReporter;
import zipkin2.reporter.Sender;
import zipkin2.reporter.okhttp3.OkHttpSender;
@@ -38,7 +39,10 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
/** Configuration for how to send spans to Zipkin */
@Bean Sender sender() {
- return OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
+ return OkHttpSender.newBuilder()
+ .endpoint("http://127.0.0.1:9411/api/v2/spans")
+ .messageMaxBytes(4096)
+ .encoding(Encoding.PROTO3).build();
}
/** Configuration for how to buffer spans into messages for Zipkin */
|
Tweaked
|
cool... I did a run with the prior commit, but metrics weren't reported at all for spans. Some unrelated glitch I'm sure.. curl -s localhost:9411/metrics|jq .
{
"counter.zipkin_collector.messages.http": 17424,
"counter.zipkin_collector.spans_dropped.http": 0,
"gauge.zipkin_collector.message_bytes.http": 2736,
"counter.zipkin_collector.bytes.http": 69106481,
"gauge.zipkin_collector.message_spans.http": 0,
"counter.zipkin_collector.spans.http": 0,
"counter.zipkin_collector.messages_dropped.http": 0
} Anyway, I'm running off |
metrics aren't coming through, but it is only the span count.. so weird. |
I will build local as I can't see easily why the metrics are zero for spans. debugging the client shows the http response was 202 |
the problem in span stats is kindof making it hard for me to tell what's going on. it is possible something subtle during the bump to latest micrometer, but it is so weird. Ex I can manually send traffic, then a few times it will increase the number but after a while it doesn't. meanwhile the other stats seem fine. |
the above thing has to do with proto encoding. I'm switching tests back to json, just smaller messages, until it is figured out. |
ok sorry about all the chatter. a successful test scenario where any load occurs requires storage throttling as we know ES falls over especially on localhost. Ex if directly like this.. $ STORAGE_THROTTLE_ENABLED=true STORAGE_TYPE=elasticsearch java -jar ./zipkin-server/target/zipkin-server-*exec.jar Then, something unrelated to this is busted in proto3 encoding. To avoid this, don't use it :P Ex. use the following sender setup in brave-webmvc-example. It is important to override the max bytes to something less than default 5MiB, if using laptop etc, but I don't know what that value is. /** Configuration for how to send spans to Zipkin */
@Bean Sender sender() {
return OkHttpSender.newBuilder()
.endpoint("http://127.0.0.1:9411/api/v2/spans")
.messageMaxBytes(4096)
.build();
} Anyway, with storage throttling enabled and the above running directly off this branch I got a clean bench against brave-webmvc-example. zero spans dropped. I haven't tried to setup this in docker, yet but I don't have it in me to do that tonight :P |
Well that's enough for me to move this off of draft :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a great amount of work. thank you! While not critical for this PR, I want to look into the self-tracing and see why it isn't working. Otherwise when you feel ready to merge, go ahead!
@@ -85,7 +85,7 @@ public ObjectNode fetchMetricsFromMicrometer() { | |||
// Delegates the health endpoint from the Actuator to the root context path and can be deprecated | |||
// in future in favour of Actuator endpoints | |||
@Get("/health") | |||
public HttpResponse getHealth() throws JsonProcessingException { | |||
public AggregatedHttpResponse getHealth() throws JsonProcessingException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side q: will any of the other read methods end up blocking the event loop? ex getTraces etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope they all return AggregatedHttpResponse
so don't run on the event loop. Though I'm definitely interested in eventually migrating them to use enqueue
instead of execute
to be fully async :)
.../java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageAutoConfiguration.java
Show resolved
Hide resolved
* Customizes the {@link HttpClientBuilder} used when connecting to ElasticSearch. Mostly for | ||
* testing. | ||
*/ | ||
public abstract Builder clientCustomizer(Consumer<HttpClientBuilder> clientCustomizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if only for testing, we should make it package private like the above. the more we expose the more we break on next refactor :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect one or two of these are exposed for zipkin-aws, if so a comment under the javadoc saying literally what is using it will help us from accidentally breaking.
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java
Show resolved
Hide resolved
...n-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/IndexNameFormatter.java
Show resolved
Hide resolved
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java
Show resolved
Hide resolved
...asticsearch/src/main/java/zipkin2/elasticsearch/internal/client/RawContentLoggingClient.java
Show resolved
Hide resolved
...storage/elasticsearch/src/test/java/zipkin2/elasticsearch/ElasticsearchSpanConsumerTest.java
Show resolved
Hide resolved
self-tracing seems it was busted prior to this. we can follow-up with the small things in future pull requests. Great job @anuraaga! |
self-tracing follow-up #2663 |
thx for the loopback
…On Sun, Jul 14, 2019, 5:28 PM Anuraag Agrawal ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
zipkin-server/src/main/java/zipkin2/server/internal/MetricsHealthController.java
<#2653 (comment)>:
> @@ -85,7 +85,7 @@ public ObjectNode fetchMetricsFromMicrometer() {
// Delegates the health endpoint from the Actuator to the root context path and can be deprecated
// in future in favour of Actuator endpoints
@get("/health")
- public HttpResponse getHealth() throws JsonProcessingException {
+ public AggregatedHttpResponse getHealth() throws JsonProcessingException {
Nope they all return AggregatedHttpResponse so don't run on the event
loop. Though I'm definitely interested in eventually migrating them to use
enqueue instead of execute to be fully async :)
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#2653?email_source=notifications&email_token=AAAPVVYPCSODEUD2TDVDSETP7LPUXA5CNFSM4H6ZPMGKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB6LT7JI#discussion_r303232383>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVV3D2YR6MENG62OWZCDP7LPUXANCNFSM4H6ZPMGA>
.
|
I ran into this while debugging why `awaitInitialEndpoints` in openzipkin/zipkin#2653 often takes seconds to resolve `localhost`. I don't think I found the real culprit yet, but it does look like when closing and re-registering a health checked endpoint group, health checks from the closed one will still continue to be sent forever.
I ran into this while debugging why `awaitInitialEndpoints` in openzipkin/zipkin#2653 often takes seconds to resolve `localhost`. I don't think I found the real culprit yet, but it does look like when closing and re-registering a health checked endpoint group, health checks from the closed one will still continue to be sent forever.
Fixes #2646
I haven't gotten to skim through this myself yet, so there's probably silly cleanups needed still, but feel free to take an initial look too if you have time. I also haven't run the integration tests yet.
In a followup PR, I'll probably migrate from moshi to jackson since we already depend on the latter through armeria / spring, so we can remove moshi and okio for a few hundred K of library savings. The fact that we don't natively use okio buffers now makes using moshi more complicated, and probably slower.