-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple sources without "concatenate_sources" only fingerprints last source #8
Comments
I agree with Tom. Buddy P pointed this out to me on the mailing list and I forked to debug this issue and its pretty easy to find the problem as line #91 (not the is_a Array call below is only if the individual item in the source array points to an array, not is the source an array itself):
but I couldn't also figure out the best behavior. I originally left concatenate off as I figured it was a performance hit to concat the strings but I can't figure out a better way to build the hash. Obviously you could hash the previous hash + the new field but that seems worse perf than just concatting together. The current code does hash every field in the source, it just overrides the result of the previous hash with the next one (so only the last field matters as Tom mentioned). The concat option does add the field name to the string also, so removing that may be a better option? I would say concat_sources should be true by default, and instead an additional option of include_field_name or similar should be added to allow adding that to the string. At a minimum if no behavior changes are desired the existing code should be modified to only hash the last time in a source array as right now we are just wasting CPU cycles. The documentation should also make it clear that using a source array is useless if concat is not enabled. |
I second the previous opinion. I just hit the issue yesterday and after some debugging found the reason for repetitive fingerprints with different messages and same timestamps (otherwise we wouldn't have ever found it).
|
+1 this is a bug indeed. looking at the spec suite: describe "Concatenate multiple values into 1" do
config <<-CONFIG
filter {
fingerprint {
source => ['field1', 'field2']
key => "longencryptionkey"
method => 'MD5'
}
}
CONFIG
sample("field1" => "test1", "field2" => "test2") do
insist { subject["fingerprint"]} == "872da745e45192c2a1d4bf7c1ff8a370"
end
end The test isn't actually correct, since it tests that the filter will "concatenate multiple values into 1" but then checks that the result is 872da745e45192c2a1d4bf7c1ff8a370. checking how that is computed: > require 'openssl'
=> true
jruby-1.7.23 :002 > OpenSSL::HMAC.hexdigest(OpenSSL::Digest::MD5.new, "longencryptionkey", "test2".to_s).force_encoding(Encoding::UTF_8)
=> "872da745e45192c2a1d4bf7c1ff8a370" It's clearly confirming that only the last value is used for checksumming. That said, I'm also not sure what the behaviour should be when multiple source fields are used and So I'm wondering if there's any use case where a user wants to disable concatenate_sources and user multiple source fields. If there's none, I vote for removing the option and always concatenate. |
The question remains how else should the fingerprinting work with multiple sources and concatenate_sources option set to false. |
Well, I'm questioning the use of any of those methods. If no one has a use for non concatenated multi-field sources then I'd suggest removing the option and always concatenate. |
@jsvd yes but when |
Just found this bug also. The docs say this:
The code doesn't do this, though. Should be an easy fix assuming what is documented is the desired behavior |
As it stands and with the incorrect documentation, the default behavior likely leads people to believe that multiple source fields are being fingerprinted in aggregate rather than simply the last field being fingerprinted. I'd suggest removing the |
I got bit by this! (my fault for not doing thorough enough testing) |
I agree that it is unlikely to see a use case for returning an array of hashes. However, having said that, I don't think I would change the functionality -- I would just make it do what the documentation says it should do. If fingerprint were to return an array as documented, that would be a very good indicator that something unexpected was being done, and would lead to fewer unexpected bugs. I would worry that removing concatenate_sources might break existing code and lead to backwards incompatibility. |
If you put multiple sources in the "source" key with "concatenate_sources" set to false, this filter only fingerprints the value of the last source. Additionally, the test "Concatenate multiple values into 1" is incorrect, this is only fingerprinting "field2". Example:
filter {
key => "secret"
method => "SHA256"
source => ["@timestamp", "host"]
}
Outputs the same fingerprint every time. If I switch it so timestamp is last, then it changes. If this is intended, then the docs don't really make that clear.
I can make a PR for this, but I'm not sure what you'd like the intended behavior to be - perhaps if you pass multiple sources, it turns concatenate_sources to true? Or it creates multiple fingerprints?
The text was updated successfully, but these errors were encountered: