Home

TA-setops

Set Operations Technology Add-On for Splunk

Discrete mathematics for Splunk is here. Winner of the 2016 Developer Revolution Award, this app provides the ability to evaluate relations, apply set operations (union, intersection, difference, etc.), plus the 'distinctfields' and 'distinctstream' commands (https://www.youtube.com/watch?v=Z6w-VG2CpP0).

OVERVIEW

Release notes
Support and resources

INSTALLATION AND CONFIGURATION

Requirements
Installation
Configuration

USAGE

setop command
distinctfields command
distinctstream command
mvbm command

OVERVIEW

Release notes

About this release

Version 1.2.0 of TA-setops is compatible with:

Splunk Enterprise versions	6.3+
Platforms	Platform independent
Lookup file changes	None

Fixed issues

Version 1.2.0 of TA-setops fixes the following issues:

None

Known issues

None

Support and resources

Please post questions at https://answers.splunk.com, however this app is provided as is with no warranty, implied or otherwise; please see the LICENSE document for more information. Feedback about possible improvements and good news stories of how this app has helped your organisation are most welcome.

INSTALLATION AND CONFIGURATION

Requirements

Hardware requirements

None

Software requirements

To function properly, TA-setops requires the following software:

Splunk Enterprise 6.3+

Installation

Simply install this app on your search head/s and restart Splunk.

Configuration

No configuration is required.

Usage

This app uses python's set operators (https://docs.python.org/2/library/sets.html#set-objects). Provided below are a range of examples searches demonstrating their operation. Please copy and paste them to see the output of the commands.

setop command

cardinality

 | stats count | eval a=split("aaa aaa bbb ccc", " ") | fields - count | setop op=cardinality a

operations

unions:

 | stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ddd", " ") | fields - count | setop op=union a b

intersection:

| stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ddd", " ") | fields - count | setop op=intersection a b

difference:

| stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ddd", " ") | fields - count | setop op=difference a b

symmetric difference:

| stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ddd", " ") | fields - count | setop op=symmetric_difference a b

relations:

equal:

| stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ccc", " ") | fields - count | setop op=relation a b

partially disjoint:

| stats count | eval a=split("aaa bbb ccc", " ") | eval b=split("bbb aaa ddd", " ") | fields - count | setop op=relation a b

superset:

| stats count | eval a=split("aaa bbb ccc ddd", " ") | eval b=split("bbb aaa ccc", " ") | fields - count | setop op=relation a b

subset:

| stats count | eval a=split("aaa bbb", " ") | eval b=split("bbb aaa ccc", " ") | fields - count | setop op=relation a b

fully disjoint:

| stats count | eval a=split("aaa bbb", " ") | eval b=split("ccc ddd", " ") | fields - count | setop op=relation a b

distinctfields command

It's strongly recommended to put a 'table' or 'fields' command before 'distinctfields' to remove extraneous fields (incl. _raw if not required) to improve performance. This app provides a distinctfields_example lookup which can be used for testing.

Without a 'by' field:

| inputlookup distinctfields_example | distinctfields field1 field2 field3

With a 'by' field:

| inputlookup distinctfields_example | distinctfields by=field1 field2 field3

distinctstream command

The distinctstream command provides the same functionality as the distinctfields command, however it is the streaming alternative. Being a streaming command, distinctstream's performance is much greater than that of a non-streaming command (such as distinctfields), but its output is dependent on the order of the events.

With some use cases (such as detection), finding the first instance of a behaviour is important. In contrast to distinctfields, distinctstream provides the means to find the first instance of a distinct set of values.

The distinctstream command is the perfect companion to tstats for analysis at scale given performance and naturally reverse order of the events. Compare the behaviour of the distinctfields examples above with those here to see how they differ.

Without a 'by' field:

| inputlookup distinctfields_example | distinctstream field1 field2 field3

With a 'by' field:

| inputlookup distinctfields_example | distinctstream by=field1 field2 field3

Reverse the event order without a 'by' field:

| inputlookup distinctfields_example | reverse | distinctstream field1 field2 field3

Reverse the event order with a 'by' field":

| inputlookup distinctfields_example | reverse | distinctstream by=field1 field2 field3

mvbm command

The multi-value binary matrix (mvbm) command isn't really a set operation, but a streaming command for one-hot encoding multi-value fields to produce a binary matrix for use as features with Machine Learning algorithms. This is needed because if you've ever fed a multi-value field to an MLTK algorithm, you'll know that it merges all the values into a single space-delimited one-hot encoded field (which is erroneous). You need to pass the command one 'field' argument and it will add to your events a field for each value seen in that field with the value of 1. To populate the remaining empty fields, I suggest you pipe that to a fillnull command. Note: the fields that mvbm produce are called: <fieldname>_mvbm_<fieldvalue> - depending on your needs, you may like to rename that en masse with a rename command such as: ... | rename <fieldname>_mvbm_* AS *

Here's a trivial usage example:

| stats count | eval example="aaa bbb ccc" | makemv tokenizer="([A-Za-z]+)" example | mvbm field=example | fillnull | table example_mvbm_*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

TA-setops

OVERVIEW

INSTALLATION AND CONFIGURATION

USAGE

OVERVIEW

Release notes

About this release

Fixed issues

Known issues

Support and resources

INSTALLATION AND CONFIGURATION

Requirements

Hardware requirements

Software requirements

Installation

Configuration

Usage

setop command

cardinality

operations

relations:

distinctfields command

distinctstream command

mvbm command

Clone this wiki locally