Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

jordansissel · 2015-05-18T04:51:45Z

(This issue was originally filed by @nicholas-marshall at elastic/logstash#2396)

Good Day,

I am working on creating hash values for the 5-tupes of src_ip, src_port, dest_ip, dest_port, proto and then dest_ip, dest_port, src_ip, src_port, proto in order to use these two fingerprints to build bidirectional flows out of flow data I am collecting. However with the following fingerprint filter:

Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.

if [src_ip] and [dest_ip] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "src_ip", "src_port", "dest_ip", "dest_port", "proto" ]
target => "src_fingerprint"
}

fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "dest_ip", "dest_port", "src_ip", "src_port", "proto" ]
target => "dest_fingerprint"
}
}

Both src_fingerprint and dest_fingerprint are the same. I find this to be very confusing as a fingerprint should be unique and a hash of two strings should be different values. Digging into the ruby code of fingerprint.rb on line 63 has @source.sort.each do |k| which sorts the values in source before concatenating them. So by sorting the values of source before hashing them causes collisions and non-unique values.

I fixed it for my use-case by changing @source.sort.each do |k| to @source.each do |k|, however I suggest adding an option in the fingerprint filter to the effect of unsorted_source => true. Removing the sort part of the code at this point would break backwards compatibility as fingerprints would suddenly change.

Sincerely,

Nicholas Marshall

sliddjur · 2020-11-17T21:32:15Z

This problem still exist to me, and it doesn't seem like 7292935 has fixed it.
I am running v3.2.2 of logstash fingerprint plugin

Sample data:

"fw": { "talkers": [ "222.222.222.222", "111.111.111.111"  ] }
"fw": { "talkers": [  "111.111.111.111", "222.222.222.222" ] }

Now I run fingerprint on this value to produce hash

  fingerprint {
    method => "MURMUR3"
    source => "[fw][talkers]"
    target => "[fw][talkers_hash]"
    concatenate_sources => true
}

And they don't produce the same result.

This also doesn't sort before fingerprint. both source fields are a string with an ipv4 address.

      fingerprint {
        method => "MURMUR3"
        source => [ "[fw][src_ip]", "[fw][dst_ip]" ]
        target => "[fw][talkers_hash2]"
        concatenate_sources => true
      }

For me, the workaround is using ruby filter to sort before fingerprint it
ruby { code => 'event.set("[fw][talkers]", event.get("[fw][talkers]").sort)' }

rahulsinghai · 2022-10-03T20:22:38Z

Hi @sliddjur , sorting of fields specified in source field has been removed by 7292935.
Now it is up to end user to specify order of fields that he/she wants to be considered while calculating hash.

jordansissel mentioned this issue May 18, 2015

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows elastic/logstash#2396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

jordansissel commented May 18, 2015

sliddjur commented Nov 17, 2020

rahulsinghai commented Oct 3, 2022

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

Comments

jordansissel commented May 18, 2015

Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.

sliddjur commented Nov 17, 2020

rahulsinghai commented Oct 3, 2022