Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from okhttp to armeria #2653

Merged
merged 27 commits into from
Jul 14, 2019

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Jul 8, 2019

Fixes #2646

I haven't gotten to skim through this myself yet, so there's probably silly cleanups needed still, but feel free to take an initial look too if you have time. I also haven't run the integration tests yet.

In a followup PR, I'll probably migrate from moshi to jackson since we already depend on the latter through armeria / spring, so we can remove moshi and okio for a few hundred K of library savings. The fact that we don't natively use okio buffers now makes using moshi more complicated, and probably slower.

@anuraaga anuraaga requested a review from codefromthecrypt July 8, 2019 09:03
@anuraaga anuraaga changed the title Migrate from okhttp to elasticsearch Migrate from okhttp to armeria Jul 8, 2019
@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 8, 2019 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 9, 2019 via email

@anuraaga
Copy link
Contributor Author

anuraaga commented Jul 9, 2019

Updated zipkin-server too and this is getting better, but one WIP is setting connect timeout, which isn't too hard but will bring back the close method on the storage which I thought wasn't needed by using Armeria's default clientfactory.

Also for the record, want to say it was quite painful updating the zipkin-server tests due to Kotlin. Not only is it unfamiliar and a big context switch, but I find it has quite strange syntax and quirks when integrating with normal Java libraries - for the life of me I couldn't get assertThat(Throwable).satisfies { checks } to even compile. So another post-humous vote against the Kotlin tests ;)

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 9, 2019 via email

@anuraaga
Copy link
Contributor Author

V6 integration tests are working fine now, for some reason the armeria client can't connect to V7. The build passing is deceptive, since the V7 tests got skipped due to the connection failure will look more.

@anuraaga
Copy link
Contributor Author

Ok V7 should be all good too

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard yards here.

Main last mile will be making sure zipkin-aws elastcisearch is possible to refactor over the work here
https://github.com/openzipkin/zipkin-aws/tree/master/autoconfigure/storage-elasticsearch-aws/src/main/java/zipkin/autoconfigure/storage/elasticsearch/aws

var client: OkHttpClient = OkHttpClient.Builder()
.addNetworkInterceptor(BasicAuthInterceptor(ZipkinElasticsearchStorageProperties(false, 0)))
.build()
companion object {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry.. this looks like it probably wasn't fun to figure out

}

@Override
public void close() {
if (!shutdownClientOnClose()) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional to remove the shutdownClientOnClose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah without customization, we use the default ClientFactory, and it's a no-op to close it so no need for us to worry.

* This returns a Dns provider that combines the IPv4 or IPv6 addresses from a supplied list of
* urls, provided they are all http and share the same port.
*/
final class PseudoAddressRecordSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow forgot how much code this was..

@@ -13,17 +13,20 @@
*/
package zipkin2.elasticsearch.internal.client;

import java.io.Closeable;
import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delay jackson to a later pull request with some of the okio utilities?

ex does one or the other here help?
https://github.com/square/okio/pull/346/files
https://github.com/square/okio/blob/d7fcee8db94177ac4d02e205f928d8425f352d52/okio/src/jvmTest/java/okio/NioTest.java

if not, add a TODO: to follow-up to swap out okio/moshi as we shouldn't really have multiple things doing the same if we can avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - fixed the bug in ReadBuffer and used it

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First Kotlin weirdness, now weirdest Armeria issue ever line/armeria#1895

I've learned my lesson to not call issues straightforward without care ;)

}

@Override
public void close() {
if (!shutdownClientOnClose()) return;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah without customization, we use the default ClientFactory, and it's a no-op to close it so no need for us to worry.

@@ -13,17 +13,20 @@
*/
package zipkin2.elasticsearch.internal.client;

import java.io.Closeable;
import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - fixed the bug in ReadBuffer and used it

@codefromthecrypt
Copy link
Member

build is green apart from being too long which made travis kill the job

@codefromthecrypt
Copy link
Member

Thanks for all the work on this, Rag.

I started ES 6.8 then ran zipkin off this branch like so:

$ curl -sSL https://jitpack.io/com/github/anuraaga/zipkin/zipkin-server/elasticsearch-armeria-SNAPSHOT/zipkin-server-elasticsearch-armeria-SNAPSHOT-exec.jar > zipkin.jar
$ STORAGE_TYPE=elasticsearch ES_HTTP_LOGGING=BODY java -jar zipkin.jar

I then hit the health check endpoint like this:

$ curl -s localhost:9411/health

The health check hasn't returned a value (hung). I looked at the server logs and see the requests made (though it isn't unrolling the gzipped response into the console which iirc the old code did)

2019-07-12 09:31:11.045  INFO 34688 --- [-worker-nio-2-5] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-12T00:31:11.035Z(1562891471035865), length=0B, duration=7805µs(7805101ns), scheme=none+h1c, headers=[:method=GET, :path=/_cluster/health/zipkin*span-*, :authority=localhost:9200, :scheme=http, accept-encoding=gzip,deflate, user-agent=armeria/0.88.0]}
2019-07-12 09:31:11.071  INFO 34688 --- [-worker-nio-2-5] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-12T00:31:11.070Z(1562891471070979), length=234B, duration=372µs(372805ns), headers=[:status=200, content-type=application/json; charset=UTF-8, content-encoding=gzip, content-length=234]}

@codefromthecrypt
Copy link
Member

I added --no-cache to the docker-compose instructions just in case.. I will do it with the exact commit id if we think somehow jitpack is stale..

@codefromthecrypt
Copy link
Member

sorry will check!

@anuraaga
Copy link
Contributor Author

anuraaga commented Jul 13, 2019

Thanks :) FWIW, ran a simple stress test via ITElasticStorage (hence log4j2.properties is different from normal server) on master and compared the logs look closer to those ones which must be OKHTTP logs

INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3970ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3977ms, 248851-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4031ms, 248631-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3956ms, 248455-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4043ms, 248522-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4064ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4028ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4072ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4138ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4297ms, 248647-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4180ms, 248849-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4334ms, 248827-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4341ms, 248816-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4246ms, 248712-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)

@codefromthecrypt
Copy link
Member

the problem was that I looked at stats and 100% of the bulk requests failed. I'll lock in the commit and try again though.

@codefromthecrypt
Copy link
Member

ok I rebuilt using the commit tag

--- a/zipkin/Dockerfile
+++ b/zipkin/Dockerfile
@@ -16,7 +16,7 @@ FROM alpine
 
 ENV USER anuraaga
 ENV BRANCH elasticsearch-armeria
-ENV VERSION SNAPSHOT
+ENV VERSION 76263f9bd0-1
 
 # Add environment settings for supported storage types
 COPY zipkin/ /zipkin/

from here https://jitpack.io/com/github/anuraaga/zipkin/elasticsearch-armeria-76263f9bd0-1/build.log

I'll do the test again now..

@codefromthecrypt
Copy link
Member

definitely I had an old version before.. maybe because I didn't do --no-cache at first, but anyway I will only test with commit IDs now to reduce distraction.

I'm sure you are right by the way, they were okhttp logs!

Definitely the logs look different

zipkin                      | 2019-07-13 11:34:10.080  INFO 1 --- [orker-epoll-4-6] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-13T11:34:10.074Z(1563017650074736), length=3739KiB(3829669B), duration=5635?s(5635700ns), scheme=none+h1c, headers=[]}
zipkin                      | 2019-07-13 11:34:10.313  INFO 1 --- [orker-epoll-4-7] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-13T11:34:10.311Z(1563017650311478), length=3397KiB(3479449B), duration=1783?s(1783500ns), scheme=none+h1c, headers=[]}
zipkin                      | 2019-07-13 11:34:12.416  WARN 1 --- [orker-epoll-4-1] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:12.416Z(1563017652416394), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:13.197  WARN 1 --- [orker-epoll-4-8] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:13.197Z(1563017653197137), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:15.379  WARN 1 --- [orker-epoll-4-2] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:15.378Z(1563017655378782), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:17.530  WARN 1 --- [orker-epoll-4-3] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:17.530Z(1563017657530053), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:18.613  WARN 1 --- [orker-epoll-4-4] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:18.613Z(1563017658613401), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:18.815  WARN 1 --- [orker-epoll-4-5] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:18.815Z(1563017658815107), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:20.228  WARN 1 --- [orker-epoll-4-6] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:20.228Z(1563017660228235), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:20.304  WARN 1 --- [orker-epoll-4-7] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:20.304Z(1563017660304298), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 

@codefromthecrypt
Copy link
Member

so basically it is still dropping almost every span before and after I guess..

@codefromthecrypt
Copy link
Member

$ curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 46,
  "counter.zipkin_collector.spans_dropped.http": 253822,
  "gauge.zipkin_collector.message_bytes.http": 1857454,
  "counter.zipkin_collector.bytes.http": 92070197,
  "gauge.zipkin_collector.message_spans.http": 5877,
  "counter.zipkin_collector.spans.http": 258104,
  "counter.zipkin_collector.messages_dropped.http": 0
}

@codefromthecrypt
Copy link
Member

I'm running against last release to check also..

@codefromthecrypt
Copy link
Member

to make docker not fall over, I needed to change the sender configuration to send less data at a time. This is before and after the change. I just got a clean run of before.. will do one of the after now.

diff --git a/webmvc4/pom.xml b/webmvc4/pom.xml
index af9f420..79d2a2c 100755
--- a/webmvc4/pom.xml
+++ b/webmvc4/pom.xml
@@ -18,7 +18,7 @@
 
     <spring.version>4.3.24.RELEASE</spring.version>
     <log4j.version>2.11.2</log4j.version>
-    <brave.version>5.6.6</brave.version>
+    <brave.version>5.6.7</brave.version>
   </properties>
 
   <dependencyManagement>
diff --git a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
index 095c043..38722b2 100644
--- a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -20,6 +20,7 @@ import org.springframework.context.annotation.Import;
 import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
 import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;
 import zipkin2.Span;
+import zipkin2.codec.Encoding;
 import zipkin2.reporter.AsyncReporter;
 import zipkin2.reporter.Sender;
 import zipkin2.reporter.okhttp3.OkHttpSender;
@@ -38,7 +39,10 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
 
   /** Configuration for how to send spans to Zipkin */
   @Bean Sender sender() {
-    return OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
+    return OkHttpSender.newBuilder()
+        .endpoint("http://127.0.0.1:9411/api/v2/spans")
+        .messageMaxBytes(4096)
+        .encoding(Encoding.PROTO3).build();
   }
 
   /** Configuration for how to buffer spans into messages for Zipkin */

@anuraaga
Copy link
Contributor Author

Tweaked toString to get this now

{"status":"UP","zipkin":{"status":"UP","details":{"ElasticsearchStorage{clientCustomizer=[TimeoutCustomizer{timeout=10000ms}, LoggingCustomizer{httpLogging=BODY}], clientFactoryCustomizer=[TimeoutCustomizer{timeout=10000ms}], hostsSupplier=[http://localhost:9200], pipeline=null, flushOnWrites=false, maxRequests=64, strictTraceId=true, searchEnabled=true, autocompleteKeys=[], autocompleteTtl=3600000, autocompleteCardinality=20000, indexShards=5, indexReplicas=1, indexNameFormatter=IndexNameFormatter{index=zipkin, dateSeparator=-, dateFormat=yyyy-MM-dd}, namesLookback=86400000}":{"status":"UP"}}}}

@codefromthecrypt
Copy link
Member

cool... I did a run with the prior commit, but metrics weren't reported at all for spans. Some unrelated glitch I'm sure..

curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 17424,
  "counter.zipkin_collector.spans_dropped.http": 0,
  "gauge.zipkin_collector.message_bytes.http": 2736,
  "counter.zipkin_collector.bytes.http": 69106481,
  "gauge.zipkin_collector.message_spans.http": 0,
  "counter.zipkin_collector.spans.http": 0,
  "counter.zipkin_collector.messages_dropped.http": 0
}

Anyway, I'm running off ENV VERSION c1d3eac893-1 and will try again

@codefromthecrypt
Copy link
Member

metrics aren't coming through, but it is only the span count.. so weird.

@codefromthecrypt
Copy link
Member

I will build local as I can't see easily why the metrics are zero for spans. debugging the client shows the http response was 202

@codefromthecrypt
Copy link
Member

the problem in span stats is kindof making it hard for me to tell what's going on. it is possible something subtle during the bump to latest micrometer, but it is so weird. Ex I can manually send traffic, then a few times it will increase the number but after a while it doesn't. meanwhile the other stats seem fine.

@codefromthecrypt
Copy link
Member

Screenshot 2019-07-13 at 9 41 10 PM

ok something's odd

@codefromthecrypt
Copy link
Member

the above thing has to do with proto encoding. I'm switching tests back to json, just smaller messages, until it is figured out.

@codefromthecrypt
Copy link
Member

ok sorry about all the chatter. a successful test scenario where any load occurs requires storage throttling as we know ES falls over especially on localhost.

Ex if directly like this..

$ STORAGE_THROTTLE_ENABLED=true STORAGE_TYPE=elasticsearch java -jar ./zipkin-server/target/zipkin-server-*exec.jar

Then, something unrelated to this is busted in proto3 encoding. To avoid this, don't use it :P Ex. use the following sender setup in brave-webmvc-example. It is important to override the max bytes to something less than default 5MiB, if using laptop etc, but I don't know what that value is.

  /** Configuration for how to send spans to Zipkin */
  @Bean Sender sender() {
    return OkHttpSender.newBuilder()
        .endpoint("http://127.0.0.1:9411/api/v2/spans")
        .messageMaxBytes(4096)
        .build();
  }

Anyway, with storage throttling enabled and the above running directly off this branch I got a clean bench against brave-webmvc-example. zero spans dropped. I haven't tried to setup this in docker, yet but I don't have it in me to do that tonight :P

@anuraaga anuraaga marked this pull request as ready for review July 13, 2019 14:01
@anuraaga
Copy link
Contributor Author

Well that's enough for me to move this off of draft :)

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great amount of work. thank you! While not critical for this PR, I want to look into the self-tracing and see why it isn't working. Otherwise when you feel ready to merge, go ahead!

@@ -85,7 +85,7 @@ public ObjectNode fetchMetricsFromMicrometer() {
// Delegates the health endpoint from the Actuator to the root context path and can be deprecated
// in future in favour of Actuator endpoints
@Get("/health")
public HttpResponse getHealth() throws JsonProcessingException {
public AggregatedHttpResponse getHealth() throws JsonProcessingException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side q: will any of the other read methods end up blocking the event loop? ex getTraces etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope they all return AggregatedHttpResponse so don't run on the event loop. Though I'm definitely interested in eventually migrating them to use enqueue instead of execute to be fully async :)

zipkin-storage/elasticsearch/README.md Show resolved Hide resolved
* Customizes the {@link HttpClientBuilder} used when connecting to ElasticSearch. Mostly for
* testing.
*/
public abstract Builder clientCustomizer(Consumer<HttpClientBuilder> clientCustomizer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only for testing, we should make it package private like the above. the more we expose the more we break on next refactor :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect one or two of these are exposed for zipkin-aws, if so a comment under the javadoc saying literally what is using it will help us from accidentally breaking.

@codefromthecrypt
Copy link
Member

self-tracing seems it was busted prior to this. we can follow-up with the small things in future pull requests. Great job @anuraaga!

@codefromthecrypt codefromthecrypt merged commit fc8cc19 into openzipkin:master Jul 14, 2019
@codefromthecrypt
Copy link
Member

self-tracing follow-up #2663

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 15, 2019 via email

trustin pushed a commit to line/armeria that referenced this pull request Jul 18, 2019
I ran into this while debugging why `awaitInitialEndpoints` in openzipkin/zipkin#2653 often takes seconds to resolve `localhost`. I don't think I found the real culprit yet, but it does look like when closing and re-registering a health checked endpoint group, health checks from the closed one will still continue to be sent forever.
abesto pushed a commit to abesto/zipkin that referenced this pull request Sep 10, 2019
fmguerreiro pushed a commit to fmguerreiro/armeria that referenced this pull request Sep 19, 2020
I ran into this while debugging why `awaitInitialEndpoints` in openzipkin/zipkin#2653 often takes seconds to resolve `localhost`. I don't think I found the real culprit yet, but it does look like when closing and re-registering a health checked endpoint group, health checks from the closed one will still continue to be sent forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider what to do about OkHttp switching to Kotlin
3 participants