Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk helper should instrument entire operation #105

Open
grantr opened this issue Sep 10, 2015 · 5 comments
Open

Bulk helper should instrument entire operation #105

grantr opened this issue Sep 10, 2015 · 5 comments

Comments

@grantr
Copy link
Contributor

grantr commented Sep 10, 2015

The Bulk helper can issue multiple requests behind the scenes if requested. In this case we should instrument the entire operation as a single action in addition to the underlying scan/bulk request actions.

@chrismwendt
Copy link
Contributor

How will the arguments to Client#instrument for the entire operation differ from the individual bulk requests? It seems appropriate to avoid ambiguity between the two different concepts by passing a different path, but there is no path associated with the entire operation since it's a concept and not an actual request to ES.

Here is the bulk request: https://github.com/github/elastomer-client/blob/master/lib/elastomer/client/bulk.rb#L41

@grantr
Copy link
Contributor Author

grantr commented Sep 10, 2015

I'm thinking we could omit the body, keep the path the same, and set the action to something synthetic, like bulk.total or bulk.block or something.

Maybe something like this:

def bulk( body = nil, params = nil )
      if block_given?
        params, body = (body || {}), nil
        instrument(path, body, params.merge(:action => "bulk.block") do
          yield bulk_obj = Bulk.new(self, params)
          bulk_obj.call
        end
      else

What do you think @TwP?

@chrismwendt
Copy link
Contributor

delete_by_query should also be instrumented.

@TwP
Copy link
Contributor

TwP commented Sep 17, 2015

I'm on the fence about instrumenting an entire block of bulk operations. The resulting metric will be a weird superposition of application timing information and Elasticsearch timing information. If it were possible to aggregate just the Elasticsearch timing information and store that as a single metric, then I would be 👍💯

Diving a little bit into implementation, we could achieve this by accumulating the timing data in the bulk instance itself. At the end of the block this timing data could be instrumented. I'm not sure if ActiveSupport::Notifications support a form where you can pass in the raw timing data or not (instead of relying on the block notation).

That seems like the best path forward to me. Just my $0.02

@chrismwendt
Copy link
Contributor

Getting the timing data from Elasticsearch would be easy using bulk_stream_responses in #112, but it doesn't look like there is a way to override the instrument block duration with your own value (see instrument, start, and finish):

http://api.rubyonrails.org/classes/ActiveSupport/Notifications/Instrumenter.html#method-i-instrument

The application timing could also affect delete_by_query if that were instrumented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants