Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #2396

nicholas-marshall · 2015-01-22T18:51:15Z

Good Day,

I am working on creating hash values for the 5-tupes of src_ip, src_port, dest_ip, dest_port, proto and then dest_ip, dest_port, src_ip, src_port, proto in order to use these two fingerprints to build bidirectional flows out of flow data I am collecting. However with the following fingerprint filter:

Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.

if [src_ip] and [dest_ip] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "src_ip", "src_port", "dest_ip", "dest_port", "proto" ]
target => "src_fingerprint"
}

fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "dest_ip", "dest_port", "src_ip", "src_port", "proto" ]
target => "dest_fingerprint"
}
}

Both src_fingerprint and dest_fingerprint are the same. I find this to be very confusing as a fingerprint should be unique and a hash of two strings should be different values. Digging into the ruby code of fingerprint.rb on line 63 has @source.sort.each do |k| which sorts the values in source before concatenating them. So by sorting the values of source before hashing them causes collisions and non-unique values.

I fixed it for my use-case by changing @source.sort.each do |k| to @source.each do |k|, however I suggest adding an option in the fingerprint filter to the effect of unsorted_source => true. Removing the sort part of the code at this point would break backwards compatibility as fingerprints would suddenly change.

Sincerely,

Nicholas Marshall

electrical · 2015-01-22T20:18:42Z

Thank you very much for the extensive report.
Its indeed weird for us to sort it while you should decide the order in the source array.

@jordansissel @ph @talevy your opinions ?

ph · 2015-01-26T15:45:25Z

@electrical @nicholas-marshall
I don't see any good reason why the keys are sorted for fingerprinting. I would +1 adding an option to allow predefined order of the keys but keeping the current behavior as default.

@nicholas-marshall if you want you can create a PR with your fix in the plugin repo located at: https://github.com/logstash-plugins/logstash-filter-fingerprint/

jordansissel · 2015-05-18T04:51:46Z

For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to logstash-plugins/logstash-filter-fingerprint#7. Let's continue the discussion there! :)

jordansissel closed this as completed May 18, 2015

jordansissel mentioned this issue May 18, 2015

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows logstash-plugins/logstash-filter-fingerprint#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #2396

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #2396

nicholas-marshall commented Jan 22, 2015

electrical commented Jan 22, 2015

ph commented Jan 26, 2015

jordansissel commented May 18, 2015

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #2396

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #2396

Comments

nicholas-marshall commented Jan 22, 2015

Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.

electrical commented Jan 22, 2015

ph commented Jan 26, 2015

jordansissel commented May 18, 2015