Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a parallel JSON log by default #1892

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

favilo
Copy link
Contributor

@favilo favilo commented Nov 14, 2024

For ease of programmatic parsing, and uploading to Kibana with filebeat

@favilo favilo requested a review from a team November 14, 2024 18:46
@gbanasiak
Copy link
Contributor

Need to merge #1890 in for linter to pass.

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick functional test and I'm not getting timestamp field in the logs:

% esrally list races

% cat ~/.rally/logs/rally.log 
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.metrics INFO Creating file race store

% cat ~/.rally/logs/rally.json
{"message": "OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Creating file race store", "taskName": null, "actorAddress": "-not-actor-"}

% python --version
Python 3.12.2

% pip list | grep python-json
python-json-logger            2.0.7

@gbanasiak
Copy link
Contributor

Let's also update https://github.com/elastic/rally/blob/master/docs/configuration.rst#logging, second paragraph.

@favilo
Copy link
Contributor Author

favilo commented Nov 18, 2024

I did a quick functional test and I'm not getting timestamp field in the logs:

Care to give it a try again @gbanasiak? I've added the timestamp field to the LogRecord. It will correspond to the %(asctime)s,%(msecs)d that we've been using for the other formatters. And I've added the ability to specify timezone

@gbanasiak
Copy link
Contributor

gbanasiak commented Nov 19, 2024

That's better, now I'm getting this after formatting:

{
    "timestamp": "2024-11-19 11:09:55,605074",
    "process": 12826,
    "name": "esrally.driver.driver",
    "levelname": "INFO",
    "message": "Worker[0] finished executing tasks ['sort_country_code_no_can_match_shortcut'] in 0.020128 seconds",
    "taskName": "Task-145",
    "actorAddress": "ActorAddr-(T|:51749)",
    "timezone": "UTC"
}

There's a new taskName log attribute injected by Async I/O which is not included in plain logs. Nice.

To make this easier to ingest by Filebeat / Elastic Agent we could change timestamp to @timestamp and adjust format to be ISO compatible, otherwise we would get error in ES:

failed to parse date field [2024-11-19 11:09:55,605074] with format [strict_date_optional_time||epoch_millis]

This could be addressed by proper Filebeat configuration too.

I was thinking about taking this one step further and modifying log attributes to be ECS compliant (also ECS fields in Filebeat docs). Perhaps this could be done with object translator? But I don't find obvious candidates for some of the fields:

timestamp -> @timestamp
process -> process.pid
name -> ?
levelname -> log.level
message -> message
taskName -> ?
actorAddress -> ?
timezone -> not needed if @timestamp properly formatted

docs/configuration.rst Outdated Show resolved Hide resolved
@gbanasiak
Copy link
Contributor

I was thinking about taking this one step further and modifying log attributes to be ECS compliant [..]

We have to change process field name to something different due to collision with Filebeat mappings. The fields that have no good candidates could be placed under rally top-level key.

@favilo
Copy link
Contributor Author

favilo commented Nov 20, 2024

I've used the ecs-logging python library to actually implement on-spec JSON logging. This is much better than my adhoc JSON formatter from before.

I've also subclassed the formatter in order to move the thespian log fields into their own rally.actor section.

@gbanasiak
Copy link
Contributor

Example result:

{
  "@timestamp": "2024-11-20T09:39:43.312Z",
  "log.level": "warning",
  "message": "Invoking a scroll search with the 'search' operation is deprecated and will be removed in a future release. Use 'scroll-search' instead.",
  "ecs.version": "1.6.0",
  "log": {
    "logger": "esrally.driver.runner",
    "origin": {
      "file": {
        "line": 1214,
        "name": "runner.py"
      },
      "function": "__call__"
    },
    "original": "Invoking a scroll search with the 'search' operation is deprecated and will be removed in a future release. Use 'scroll-search' instead."
  },
  "process": {
    "name": "Actor_Worker__ActorAddr-LocalAddr.1",
    "pid": 30676,
    "thread": {
      "id": 130507456198208,
      "name": "ThreadPoolExecutor-0_0"
    }
  },
  "rally": {
    "actor": {
      "address": "ActorAddr-(T|:45685)"
    }
  }
}

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to use ecs-logging. I did another round of e2e tests and found no collisions when ingesting with Filebeat.

Can we drop log.original entirely? It's useful when documents are processed with ES ingest pipeline, but I don't think we need this (at least not yet) as message field has exactly the same content.

I left some comments and questions.

esrally/log.py Outdated
Comment on lines 59 to 64
if log_record.get("actorAddress"):
actor["address"] = log_record.pop("actorAddress")
if log_record.get("taskName"):
actor["task"] = log_record.pop("taskName")
if actor:
collections.deep_update(log_record, {"rally": {"actor": actor}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see rally.actor.task populated ever, so I think taskName is lost. Once this is explained we need to think about the right placement for this info. As per docs it should correspond to asyncio.Task name so this is not related to Thespian actors. I would put it outside of rally.actor, rally.asyncio maybe? But then maybe not rally.actor but rather rally.thespian? Ugh, naming...

What's the rationale for the new collections.deep_update() method? Couldn't we simply set rally key in log record dict? It's a new top-level element, so I think that's safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can just drop taskName for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in version 3.12: taskName was added

Huh, turns out that's fairly new.

We can add %(taskName)s to the format string we pass to the Formatter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, turns out that's fairly new.

Ah, yes, most recently I've tested with 3.11.7 from esbench env. After switching to 3.12.2 and:

diff --git a/esrally/resources/logging.json b/esrally/resources/logging.json
index c1e4b6da..7651ebc5 100644
--- a/esrally/resources/logging.json
+++ b/esrally/resources/logging.json
@@ -2,12 +2,12 @@
   "version": 1,
   "formatters": {
     "normal": {
-      "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(levelname)s %(message)s",
+      "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(taskName)s %(levelname)s %(message)s",
       "datefmt": "%Y-%m-%d %H:%M:%S",
       "()": "esrally.log.configure_utc_formatter"
     },
     "json": {
-      "format": "%(message)s",
+      "format": "%(message)s %(taskName)s",
       "()": "esrally.log.configure_ecs_formatter"
     },
     "profile": {

I can see taskName populated in rally.log but not in rally.json. I think it gets swallowed in here. We could override format_to_ecs() but I'm also fine just skipping it for now. We can revisit once we find a use for this information. What we're adding in process object is already very useful.

Still interested in hearing the rationale behind deep_update() :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deep_update is so that just in case we have something already there, like someone adds a future mutator, we don't clobber any dict that is already defined. Like would happen with dict.update()

docs/configuration.rst Show resolved Hide resolved
docs/configuration.rst Show resolved Hide resolved
@favilo
Copy link
Contributor Author

favilo commented Nov 20, 2024

Can we drop log.original entirely?

Absolutely, I'm just going to drop that in logging.json, by adding an exclude_fields field. That way, of someone wants it back, they can just edit the logging.json file

For ease of programmatic parsing, and uploading to Kibana with filebeat
These take the log record, and add additional fields to it.

This is configurable via the `mutators` property in `~/.rally/logging.json`.
@favilo
Copy link
Contributor Author

favilo commented Dec 10, 2024

Okay!

It's been a while since I remembered this, but I've added the ability to change mutators and such.

The current format:

  "@timestamp":"2024-12-10T20:26:20.561Z",
  "log.level":"info",
  "message":"Hello world",
  "ecs.version":"1.6.0",
  "log":{"logger":"esrally.rally","origin":{"file":{"line":1238,"name":"rally.py"},"function":"test"}},
  "process":{"name":"MainProcess","pid":3061691,"thread":{"id":140008672693120,"name":"MainThread"}},
  "python":{"asyncio":{"task":"Task-1"},
  "thespian":{"address":"-not-actor-"}}}

Note: python.asyncio.task only shows up if it is running inside an async task. But it does actually show up now. ecs_logging was in fact eating that

Removed old stuff, and pulled out MutatorType to make the
RallyEcsFormatter definition shorter
Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for adding this. LGTM. See remark regarding protected access.

esrally/log.py Outdated Show resolved Hide resolved
This fixes a lint error that we got from trying to use a protected
method. Thanks @gbanasiak for the catch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants