Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Probes and Instrumentation

amalloy edited this page Jan 11, 2013 · 12 revisions

Computation requires a thread of execution. In synchronous code, this thread is explicit. In asynchronous code this thread is implicit, but no less real. Event-driven programming is the process of applying structure to events until they become ordered.

But what about the inverse operation, where structured execution of code is transformed into a collection of loosely-ordered events? We do this all the time: it’s called logging.

Logging is a lossy operation; log statements lose any context that isn’t explicitly captured. While we can sometimes recreate the context by considering all the log statements collectively, this is akin to archaeology; we’re trying to create a detailed picture of what happened using sparse, unconnected details and educated guesswork.

At the same time, logging is a necessary tool. It lets us understand our code without having to fundamentally change its structure, because the stream of events it generates exists outside our thread of execution.

Lamina gives us powerful tools for transforming, aggregating, and analyzing streams of events. Therefore, if we can transform these log streams into something Lamina can consume, we have a powerful tool for understanding the inner workings of our software.

probe-channels

A probe-channel is a permanent, grounded channel. This means that it cannot be closed or put into an error state, and that messages will never accumulate in its queue. A probe-channel is also named, which means that when we interact with probe-channel abc:def, that will always be the same channel.

Probe-channels represent a stream of events, typically from an instrumented piece of code. The easiest way to enqueue a message into a probe-channel is by using trace:

(trace :foo:bar
  (calculate-pi-to-1e6-digits))

This will send a million digits of pi into the probe-channel returned by (probe-channel :foo:bar). By convention, probe namespaces are colon-delimited. If the argument given to probe-channel is a sequence, they will be joined with colon separators. All of these function calls are equivalent:

(probe-channel :foo:bar)
(probe-channel "foo:bar")
(probe-channel [:foo :bar])
(probe-channel ["foo" :bar])

However, the body of the trace statement will only be executed if something is consuming messages from the probe-channel. Unless we do something with the output of (probe-channel :foo:bar), the above trace statement is effectively a no-op.

We can also consume collections of probe-channels using select-probes. This takes either a wildcard string or a regex, and returns a channel representing the combined output of all probe-channels whose names match that pattern.

> (use 'lamina.core 'lamina.trace)
nil
> (receive-all (select-probes "foo*") #(println "traced:" %))
true
> (trace :foo:bar 1)
traced: 1
true
> (trace :foo 2)
traced: 2
true
> (trace :bar 3)
false

Notice that trace returns true if the message was consumed, and false otherwise.

instrumenting functions

While littering the code with trace statements won’t affect its semantics, it is a bit of an eyesore. Since most of the time meaningful events correlate with function calls, there’s an easy way to automatically instrument functions.

Calling lamina.trace/instrument on a function will create probes for every time the function is called, returns, and throws an exception. These are called enter, return, and error probes respectively.

> (use 'clojure.pprint)
nil
> (defn f [x y] (+ x y))
#'f
> (def f* (instrument f {:name :instrumented}))
#'f*
> (receive-all (probe-channel :instrumented:enter) #(do (println "ENTERED") (pprint %)))
true
> (receive-all (probe-channel :instrumented:return) #(do (println "RETURNED") (pprint %)))
true
> (f* 1 2)
ENTERED
{:timestamp 1336127081901, :args [1 2], :description "instrumented"}
RETURNED
{:timestamp 1336127081901,
 :sub-tasks nil,
 :duration 27000,
 :args [1 2],
 :offset 0,
 :result 3,
 :description "instrumented"}
3

Notice that the we have defined our function with the name :instrumented, so the return probe can be accessed at :instrumented:return.

The fields emitted by the enter probe are:

description the function’s name
timestamp the timestamp, in milliseconds, of the event
args the arguments passed to the function

The additional fields emitted by the return probe are:

result the value returned by the function
duration the elapsed time, in nanoseconds, since the function was called
sub-tasks the return data for any instrumented functions called in the scope of the parent function
offset the elapsed time, in nanoseconds, from the topmost parent function’s invocation to this function’s invocation

Notice that we capture the return data for all other instrumented functions automatically. To explore this, let’s use defn-instrumented, a combination of defn and instrument.

> (defn-instrumented g [x y] (+ x y))
#'g
> (defn-instrumented f
	{:probes {:return (sink->> pprint)}}
	[x y]
	(+ (g x x) (g y y)))

In this, we have two instrumented functions, f and g. We do not need to explicitly define a :name in the function metadata, since that defaults to :the:function:namespace:the-function-name.

We have also defined a :probes map for f, which allows us to consume probes without explicitly calling probe-channel. The keys of the map are the probe suffix (in this case, just :return), and the values are a channel that will consume messages from that probe-channel. For this, we use the sink->> function.

> (f 1 2) 
{:description "user:f",
 :timestamp 1336128984479,
 :offset 0,
 :duration 87000,
 :sub-tasks
 ({:description "user:g",
   :timestamp 1336128984479,
   :offset 20000,
   :duration 26000,
   :sub-tasks nil,
   :args [1 1],
   :result 2}
  {:description "user:g",
   :timestamp 1336128984479,
   :offset 58000,
   :duration 17000,
   :sub-tasks nil,
   :args [2 2],
   :result 4}),
 :args [1 2],
 :result 6}
6

Notice that the :sub-tasks field now has two entries, one for each time g was called. Each of these has their own :args and :result, and an :offset with respect to when f was initially invoked.

Probe selector syntax

There is an experimental, work-in-progress feature which allows you to describe a series of channel
operations on a probe using a specialized “query language” when subscribing to a probe-channel
endpoint. For example, the query lamina:test.output.where(failures > 0).rate() subscribes to the
:lamina:test probe, isolates the :output key in that map, filters only those outputs whose
@:failure sub-key is positive, and finally produces a channel which aggregates such probes and
periodically emits a message saying how many there have been since the last probe.

That’s a bit much to swallow all at once, though, so let’s look in more detail at what features are
available and how it all fits together.

First, each selector starts with the name of a probe (which may include * wildcards to match
multiple probes). In our example, that was lamina:test. After that, there can be any number of
transformations, each of which looks like a . member access on the previous channel. Our first
transformation was of the simplest sort, a lookup. The transformation x.k simply takes all
messages from the channel x, looks up :k in them as a map, and forwards them to its output. Note
that there is a special lookup, x._, which simply returns x without looking anything up in
it. This is obviously not useful on its own, but can be handy in scenarios where a lookup is
required but you want to instead use the whole object.

The other transformations are more complicated, and more varied. Called “operators”, they take the
general form channel.operator(args) and produce some new channel. The set of operators is
open-ended, and can be extended by your program via a defoperator form, and thus they cannot all
be listed here. However, we will look at examples of a few of the most common operators as we look
at the different sorts of arguments that can be passed to an operator.

select

The select operator is a bit like clojure’s let form, and a bit like its select-keys
function. For example, foo.select(a: x, b: y.z) accepts data looking like {:x 1, :y {:z 10}} and
transforms it to {:a 1, :b 10}. Any keys not named in the select are dropped from the output
object. These a: x groupings are called “pairs”, and are passed without modification to the
operator; in this case select is responsible for taking the pairs {:a x} and {:b y.z} and
producing a function which does the necessary looking-up and renaming. Note that, aince the value
part of the pair is a lookup, you can use the special _ lookup to bind the entire incoming
map. For example, x.select(important-stuff: a, whole-thing: _) will read the :a key and store it
under :important-stuff, and put the entire map under :whole-thing@ in case it is needed after
all.

where

where is like SQL’s WHERE operator, or clojure’s filter function. It takes a lookup, a
comparison function (any of = < > ~=), and a value to compare to. It returns only those objects
for which the comparison of the given lookup with the value is true. For example, xs.select(age < 10) selects those messages whose age is less than ten. Comparison operators must be specified
infix. where supports multiple comparison clauses, separated by whitespace or commas.

periodic operators

There are a number of periodic operators, which consume all incoming messages without rebroadcasting
anything, but at periodic intervals emit some kind of summary of all messages in the last
period. All these operators take a period: <milliseconds> pair as an optional argument, which
dictates their period. Some particularly useful pre-defined periodic operators are: sample, which
retains only the last message from each period and re-transmits it; sum, which requires its inputs
be integers and emits the sum over each period; and rate, which emits a count of received messages
for each period.

group-by

Another periodic operator, group-by is special enough to deserve its own section, because it
supports a parameter format which we have so far not discussed: a “tuple” is denoted by square
brackets, such as [a b c]. But before discussing what a tuple means to group-by, let’s look at a
simpler example, of ch.group-by(a). It emits, every period, a map similar to those produced by
clojure.core/group-by. For example, if it receives three messages [{:a 1 :other 10}, {:a 1 :other 20}, {:a 2 :other 30}] in a period, then it will glom them all together into the single message {1 [{:a 1 :other 10} {:a 1 :other 20}], 2 [{:a 2 :other 30}]}. When given a tuple such as [a b], group-by will use (map the-message [:a :b]) as the key to group by, so the resulting maps will
have keys like [5 "dog"] for inputs like {:a 5, :b "dog", :c "woof"}.

TODO: stomp endpoints

Clone this wiki locally