-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeline with engine instrumentation #973
base: master
Are you sure you want to change the base?
Conversation
I spotted two typos and fixed them (entires -> entries). And a question: In the example, shouldn't the string `example` appear somewhere in the resulting log ("The log now contains there entries")?
Typo fixes, and a question
Added category to example log output (thanks for feedback @sleinen). Added functions: priority(), save(), dump() Fixed formatting (more word wrap)
Updated the timeline implementation to match the documented API. This is still a work in progress: The messages printed by the selftest all have zeros for arguments for unknown reasons; The overhead of logging has doubled for unknown reasons, possibly due to allocating with core.shm and putting a performance-critical cacheline on a 4096-aligned address that may trigger associativity problems (wild speculation, have to dig in).
I realized that when we take the timestamp with RDTSCP this is also loading the current CPU core number and NUMA node into ECX. This patch includes that register in the log dump. This seems useful so that we can account for scheduling issues from the log. Strictly speaking the value in ECX is the contents of the TSC_AUX register but it seems that we can rely on Linux to load the core ID + NUMA node there. Here is where I got the idea: http://stackoverflow.com/questions/22310028/is-there-an-x86-instruction-to-tell-which-core-the-instruction-is-being-run-on
The string pattern matching for detecting arguments ($arg) was wrong and this caused all args to be logged as 0.
Show core number and NUMA node. Write slightly better log messages for selftest. Simplify the logging loop in the selftest to simply log three messages per iteration. This way the timestamps will indicate the logging overhead. (Previous version had some funny branches that could add a few dozen cycles to the timestamps which is a bit distracting.)
Fixed an off-by-one error (braino) in choosing which timestamps to calculate the delta from. Make the cycle column wider to better accommodate large values.
The selftest function now executes a few loops to see how long it takes to log a million events that are disabled by priority. Log excerpt: numa core -cycles category message 0 3 5014856 selftest invoked many events(1000000) 0 3 5025592 selftest invoked many events(1000000) 0 3 5176004 selftest invoked many events(1000000) So looks like five cycles per event is the ballpark cost (including the loop that makes the individual FFI calls).
The events are currently always enabled. This needs to be optimized. Goal for now is simply to have interesting information in the log.
Use shm.create() instead of shm.map(). Required after update in 9fe397e.
Sample different priorities (log levels) on different breaths. Try to always include interesting details in the log but without wrapping too quickly.
end breath now reports number of packets and bits freed.
The timeline text dump now formats the log starting from the first message and continuing towards the last. The previous behavior was to print the log backwards but this was confusing and ill-considered.
Merged conflict in core/app.lua
Requires a math.max() on an argument that is otherwise FFI cdata.
🎆 Looking great! Can't wait to look at some cool plots. :-) Depending on whether #972 gets accepted, I would adapt the API of Super minor nitpick, question: When formatting optional parameters in Markdown we used Edit: I am now realizing why we used |
@eugeneia That's okay for me. Hey there are a few outstanding issues here that I would like to resolve for this release but maybe we can take on This branch actually contains a couple of hacks in this direction that are not very effective and likely should be rolled back. One is adding the env var Like I say these issues could be resolved on |
I think adding |
I notice that we already have Generally I am uncomfortable with debug modes like Have to make a |
Multiqueue fixes
This is the implementation of Timeline (#916):
core.timeline
module for efficient logging to binary ring buffers.core.engine
to keep track of engine events.Controversy?
This PR is a relatively large change. It introduces a new mechanism for defining detailed operational events for logging purposes and this is relatively verbose. It also introduces the practice of calling logging hooks even in relatively performance-sensitive code paths which could be hazardous (although I have already taken considerable pains to make this efficient). Reasonable people may have objections.
Implementation notes
The log is a 64MB (1M x 64B) shared memory object. This can be accessed as
/var/run/snabb/$pid/engine/timeline
while the Snabb is running and will be automatically removed on shutdown (thanks to #930).The log compresses well and in the near future we should follow up with a way for the Hydra CI to archive timeline logs for all tests (e.g. via a hook in the Snabb shutdown/cleanup process that creates a compressed tarball somewhere).
This implementation automatically randomizes the log level on each breath. Most breaths (99%) are not logged at all; some breaths (1%) have their start and end logged, and a few breaths (0.1%) are logged in more detail ("app level"), while extremely few (0.01%) are logged in great detail ("packet level"). The intention is to get a representative sample of log messages and for each sample to be a whole breath that is logged at a consistent level.
Example log excerpt
The complete timeline log is up to a million lines long. Here is a tiny snippet that shows ~10 breaths. One of these is logged in detail ("app level") while the others with low detail (start and end only). The events each include parameters that could be used to calculate interesting metrics (cycles per packet, etc).
Note that the first breath is #1418481 and the last breath is #1419530 and so during this interval there were 1049 breaths executed but only these ones were sampled for the log. (The more breaths we include in the sample the faster the log wraps around.)
How to use it
This PR introduces a discipline of registering detailed log messages ahead of time. This is to make the storage efficient (reference predefined strings) and the logging efficient (reuse a dedicated closure for each log message).
Here is an example of how log messages for sleeping and waking up can be defined:
and here is how those example messages are logged:
and then the log will contain messages like this:
which are interesting for a couple of reasons. First we can look at the
cycles
(delta) column to see how long we really sleep for and compare this with the time that we requested. Second we can look at thecore
column to see whether we have woken up on the same CPU core as we slept on.The idea is that the log would include many such events of different kinds and a developer can sift through to answer questions. The log format is also relatively easy to process e.g. if one would want to plot the distribution of actual cycles spent sleeping vs requested sleeping time to estimate the reliability of
usleep
.Formatting timeline logs
The log excerpts above were created with a command like:
and to remove the verbosity (complete log message text) I added `| grep '^[0-9]'.
This could be greatly improved in the future e.g. with a
snabb log
program or an extension ofsnabb top
to access timelines. There is also a graphical timeline log analyzer in the works.