Bulk helper should instrument entire operation #105

grantr · 2015-09-10T00:18:03Z

The Bulk helper can issue multiple requests behind the scenes if requested. In this case we should instrument the entire operation as a single action in addition to the underlying scan/bulk request actions.

chrismwendt · 2015-09-10T22:37:26Z

How will the arguments to Client#instrument for the entire operation differ from the individual bulk requests? It seems appropriate to avoid ambiguity between the two different concepts by passing a different path, but there is no path associated with the entire operation since it's a concept and not an actual request to ES.

Here is the bulk request: https://github.com/github/elastomer-client/blob/master/lib/elastomer/client/bulk.rb#L41

grantr · 2015-09-10T22:46:42Z

I'm thinking we could omit the body, keep the path the same, and set the action to something synthetic, like bulk.total or bulk.block or something.

Maybe something like this:

def bulk( body = nil, params = nil )
      if block_given?
        params, body = (body || {}), nil
        instrument(path, body, params.merge(:action => "bulk.block") do
          yield bulk_obj = Bulk.new(self, params)
          bulk_obj.call
        end
      else

What do you think @TwP?

chrismwendt · 2015-09-17T05:51:02Z

delete_by_query should also be instrumented.

TwP · 2015-09-17T17:04:28Z

I'm on the fence about instrumenting an entire block of bulk operations. The resulting metric will be a weird superposition of application timing information and Elasticsearch timing information. If it were possible to aggregate just the Elasticsearch timing information and store that as a single metric, then I would be 👍💯

Diving a little bit into implementation, we could achieve this by accumulating the timing data in the bulk instance itself. At the end of the block this timing data could be instrumented. I'm not sure if ActiveSupport::Notifications support a form where you can pass in the raw timing data or not (instead of relying on the block notation).

That seems like the best path forward to me. Just my $0.02

chrismwendt · 2015-09-17T18:52:20Z

Getting the timing data from Elasticsearch would be easy using bulk_stream_responses in #112, but it doesn't look like there is a way to override the instrument block duration with your own value (see instrument, start, and finish):

http://api.rubyonrails.org/classes/ActiveSupport/Notifications/Instrumenter.html#method-i-instrument

The application timing could also affect delete_by_query if that were instrumented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk helper should instrument entire operation #105

Bulk helper should instrument entire operation #105

grantr commented Sep 10, 2015

chrismwendt commented Sep 10, 2015

grantr commented Sep 10, 2015

chrismwendt commented Sep 17, 2015

TwP commented Sep 17, 2015

chrismwendt commented Sep 17, 2015

Bulk helper should instrument entire operation #105

Bulk helper should instrument entire operation #105

Comments

grantr commented Sep 10, 2015

chrismwendt commented Sep 10, 2015

grantr commented Sep 10, 2015

chrismwendt commented Sep 17, 2015

TwP commented Sep 17, 2015

chrismwendt commented Sep 17, 2015