Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip invalid lines #137

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).
This CHANGELOG follows the format listed [here](https://github.com/sensu-plugins/community/blob/master/HOW_WE_CHANGELOG.md)

## [Unreleased]
### Changed
- metrics-es-node-graphite.rb: exclude non-numeric data from results (@boutetnico)

## [3.0.0] - 2018-12-17
### Breaking Changes
Expand Down
41 changes: 37 additions & 4 deletions bin/metrics-es-node-graphite.rb
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,9 @@ def run
node['indices'].each do |type, index|
index.each do |k, v|
# #YELLOW
unless k =~ /(_time$)/ || v =~ /\d+/
if k.end_with? 'is_throttled'
metrics["indices.#{type}.#{k}"] = true?(v) ? 1 : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I am not sure on is what is the best thing to do here. I am leaning towards:

case true?(v)
when true
  1
when false
  0
else
  # not sure if we should raise, ok, critical, or return a `(0|1)`
  raise(true?(v))
end

I would not worry about implementing it until I have had someone else take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm does it make sense to have true? raise in the:

  else
    "#{obj} is not a truthy value, please open an issue with this output so we can fix it"

bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that initially but was not sure what we wanted to do so I illustrated what could be done and this allowed us to always return something and then the handling of that could be implemented independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the feedback and what I was already leaning towards I think the correct solution would be to do something like:

else
    STDERR.puts "#{obj} is not a truthy value, please open an issue with this output so we can fix it"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I see. So off hand, I'm not aware of a good way for a metric script to actually log to sensu (although the STDERR bit you mentioned might work, just worried that'll also get sent along).

Unless there's something I'm not aware of, I'd lean towards marking the current metric as unknown if possible otherwise, let's catch and raise here. I hope this condition is rare enough that this code path never has to happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, is_throttled also feels like something that might work better as a dedicated Sensu check outside of this metric script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the change to have it print to STDERR and our assumptions and understanding of the code are correct I think the logic here would look something like this:

if k.end_with? 'is_throttled'
  state = true?(v)
  unless state.class == 'String'
    metrics["indices.#{type}.#{k}"] = true?(v) ? 1 : 0
  end
end

that would return the right value when it can and when it can't it should return nothing but print a useful message to STDERR that sensu-client will pick up. I would think that any handler would not be affected as I would be surprised if sensu client sent it as STDIN which is typically what a pipe handler will by default work with. I would love some clarification from someone at sensu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, is_throttled also feels like something that might work better as a dedicated Sensu check outside of this metric script.

I can see that but IMHO they are separate concerns, I tend to try to conceptualize all sensu "checks" into the following broad categories:

  • state checking: this can be something against a local host, a remote host, etc and can either be looking at the current state of a system or some kind of representation of previous states. To further clarify you could search against a TSDB to get the average, min, max, standard deviation, etc of the last n number of state results. This is what I internally refer to as metric checks but avoid this terminology externally because it can confuse people.
  • data collection: these are what are commonly referred to the "metric" scripts in the sensu community. I think they should attempt to report data, bail with an error, or potentially a combination.

I think that yes it makes a lot of sensu to create a "state check" for that but the question is do you want to check against the current state or against something else.

Anyways those are just some thoughts/opinions and I doubt there is a "right" or "wrong" answer and its a matter of perspective. Perhaps we make this a CLI arg to control this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I thought about it and I think we should return a String or a (True|False)Class object and do the handling in the other function:

if k.end_with? 'is_throttled'
  state = true?(v)
  if state.class == 'String'
    STDERR.puts state
   # optionally we could additionally set the value to `nil` here but that relies on the appropriate system from handling this properly which seems less than ideal
  # to play devils advocate though we could have something at the end of doing all the metric massinging drop any metrics with a `nil` value which I think does have some real value as it means that we can do a final cleanup/sanitization of the output before we ship it.
  else
    metrics["indices.#{type}.#{k}"] = true?(v) ? 1 : 0
  end
end

I'm still not confident that this is the correct approach either but its something to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about throwing anything to stderr, as it will garble the resulting metrics output.

Sensu combines both strerr and stdout into the same stream when running a check command
See sensu-spawn for reference on that.

Also it looks to me like this check logic already silently ignores any metric key/value pair where value is not Numeric type...including booleans... Ruby true is not a Numeric. The logic skips all non-Numeric from final output. Again.. probably to make sure the output is sane metrics format.

at the end of the run() statement... metrics key/value are iterated and checked for Numeric type

metrics.each do |k, v|
      if v.is_a? Numeric
        output([config[:scheme], k].join('.'), v, timestamp)
      end
end

This is a full scrubbing of metrics for sanity.

Here's what I would do.

  1. if known truthy have true() return true => store Numeric 1 into metric key value
  2. if known falsey have true() return false => store Numeric 0 into metric key value
  3. unknown have true() return explainable String => store String into metric key value
    this will be scrubbed from metrics.
  4. if there are any scrubbed non-numeric metrics in that final output scrub set output status to 1 == WARNING. This tells us the metric logic is still sub-optimal but allows the check to operate.
  5. Optionally create new command line switch to raise an error msg listing all the scrubbed metrics if any metric key value is not Numeric for debugging purposes. Allows us to understand why status is WARNING so we can drive a fix into the logic.

elsif !(k =~ /(_time$)/ || v =~ /\d+/)
metrics["indices.#{type}.#{k}"] = v
end
end
Expand Down Expand Up @@ -306,16 +308,47 @@ def run
if fs_stats
node['fs'].each do |fs, fs_value|
unless fs =~ /(timestamp|data)/
fs_value.each do |k, v|
metrics["fs.#{fs}.#{k}"] = v
metrics_fs = hash_to_dotted_path(fs_value, "#{fs}.")
metrics_fs.each do |k, v|
metrics["fs.#{k}"] = v
end
end
end
end

metrics.each do |k, v|
output([config[:scheme], k].join('.'), v, timestamp)
if v.is_a? Numeric
output([config[:scheme], k].join('.'), v, timestamp)
end
end
ok
end
end

def hash_to_dotted_path(hash, path = '')
hash.each_with_object({}) do |(k, v), ret|
key = path + k.to_s
if v.is_a? Hash
ret.merge! hash_to_dotted_path(v, "#{key}.")
elsif v.is_a? Array
v.each do |element|
if element['device_name']
key2 = "#{key}.#{element['device_name']}"
ret.merge! hash_to_dotted_path(element, "#{key2}.")
end
end
else
ret[key] = v
end
end
end

def true?(obj)
if obj.to_s == 'true'
true
elsif obj.to_s == 'false'
false
else
"#{obj} is not a truthy value, please open an issue with this output so we can fix it"
end
end