layout | title | tags |
---|---|---|
article |
make apache log in json |
apache, httpd, json |
- Goal: Log directly to a structured format in apache.
- Audience: Folks looking to improve their log formats.
The default log format offered by apache is only semi-structured. It appears, as a human reading it, to have some kind of reasonable structure. However, to process this with the logstash grok filter, it requires a complex and expensive regular expression to parse it.
The best case for log formats is if you can simply emit them in a structured format from the application itself. This will reduce any extra parsing in the future!
- Configure apache to emit json to a logfile
- Configure logstash to read the file
First, we'll need to tell apache about our new log format. You'll put this in your httpd.conf:
{% include_code apache.conf %}
Keeping in mind that the goal here is to dump these logs into logstash, the json schema I provided is specific to how logstash forms its own events.
Reload the apache config, and I now see things like this in my logs:
{ "@timestamp": "2012-08-22T14:35:19-0700", "client": "127.0.0.1", "duration_usec": 532, "status": 404, "request": "/favicon.ico", "method": "GET", "referrer": "-" }
Apache's documentation explains how/why it escapes values:
For security reasons, starting with version 2.0.46, non-printable and other special characters in %r, %i and %o are escaped using \xhh sequences, where hh stands for the hexadecimal representation of the raw byte. Exceptions from this rule are " and , which are escaped by prepending a backslash, and all whitespace characters, which are written in their C-style notation (\n, \t, etc). In versions prior to 2.0.46, no escaping was performed on these strings so you had to be quite careful when dealing with raw log files. (from mod_log_config's format notes
This should suffice that our log format always produces valid JSON since apache escapes most/all necessary things JSON requires to be escaped :)
The config now is pretty simple. We simply tell logstash to expect 'logstash json' events from the given apache log file. No filters are required because we are already emitting proper logstash json events!
{% include_code logstash.conf %}
Running logstash with the above config:
% java -jar logstash.jar agent -f logstash.conf
{
"@source" => "pork.example.com",
"@type" => "apache",
"@tags" => [],
"@fields" => {
"client" => "127.0.0.1",
"duration_usec" => 240,
"status" => 404,
"request" => "/favicon.ico",
"method" => "GET",
"referrer" => "-"
},
"@timestamp" => "2012-08-22T14:53:47-0700"
}
Voila!
We can greatly simplify our setup by using mod_macro to generate the LogFormat and CustomLog at the same time.
If our VirtualHost's DirectoryRoots are consistently built we can predictably build our configuration as follows:'
{% include_code macro.conf %}
We can now create a VirtualHost, that uses this macro:
<VirtualHost *:80>
ServerName www.example.com
DirectoryRoot /srv/web/example.com/www/htdocs
Use logstash_log www.example.com prod-web137.dmz01.dc03.acme.com
</VirtualHost>
A lightweight alternative is to use lumberjack to send your Apache logs to a logstash server on another host. This doesn't cover lumberjack installation details. See the github project's README for that.
Apache can be configured to pipe logs to an external program. The nice thing about this option (we'll only focus on this) is that the piped program is under supervision of the apache master process. So no initscript/daemon to take care of, and the process will get restarted by apache, in the event it crashes.
This is done with the first of the following lines:
{% include_code apache-lumberjack.conf %}
The second line ships Apache unformatted error logs (see below). In both cases, don't miss out the trailing dash in the lumberjack command-line, which stands for "read log messages from standard input".
Then define lumberjack inputs on your logstash central server:
{% include_code logstash-lumberjack.conf %}
Note that the "format" is set to "json_event" in the first case. And as Apache has no option for error log formatting, we have to setup a second instance, listening on another port, with the "format" set to "plain".
The parts related to error logs can of course be skipped if you're only interested by the web server's access logs.
One caveat with our setup when using SELinux is that lumberjack needs to change
some system limits and connect to the logstash server, which are both blocked
by default inside the httpd context (httpd_selinux(8)
for details ). Just
run:
sudo setsebool -P httpd_setrlimit 1
sudo setsebool -P httpd_can_network_connect 1
in case you see these messages appear:
# tail /var/log/audit/audit.log
type=AVC msg=audit(1367837376.591:136095): avc: denied { setrlimit } for pid=17814 comm="lumberjack" scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:system_r:httpd_t:s0 tclass=process
type=AVC msg=audit(1367928599.621:829): avc: denied { name_connect } for pid=11398 comm="lumberjack" dest=6782 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:cyphesis_port_t:s0 tclass=tcp_socket
# tail /var/log/httpd/error_log
Assertion failed lumberjack.c:111 in set_resource_limits(), insist(rc != -1): setrlimit(RLIMIT_NOFILE, ... 103) failed: Permission denied
Well, I tested with Apache/2.2.22 and found it appears quite safe.
What is safe? Well, safe meaning apache generates valid JSON.
To test this, I made a simple apache config and two scripts; the first script spams apache with some pretty unsavory http requests, and the second script reads the apache log and verifies that all the entries parse as valid JSON.
% sh run.sh
Starting apache
Spamming apache with requests
Verifying valid JSON
Successful: 10000
Technically, what is verified above is that the ruby JSON parser can process the data. Since apache uses '\xNN' notation for escaping special characters, it is technically invalid 'JSON', but I've found that many JSON parsers happily accept it.
You can see the code for this test here: apache.conf, spam.rb, check.rb.