similar
is an unix pipeline dropin that deduplicates similar lines.
It is inspired by Grafana's log deduplication feature and brings this to the command line. It's intended use is along with other text-utils
like grep, sort and uniq.
$ cat /var/log/messages | grep cron | similar
$ similar -signature /var/log/messages /var/log/messages.1
$ make build
$ make install
similar [-none|-exact|-numbers|-signature] <files>
none := no dedup
exact := stripping all iso datetimes with millis
numbers := stripping all numbers, default
signature := stripping all numbers, letters and underscores
files := list of files to open, defaults to stdin
- the filters use regex which is pretty slow, this could be rewritten using byte operations instead
- probably more filters could be added
- build pipeline and versioning