-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add a parallel JSON log by default #1892
base: master
Are you sure you want to change the base?
Conversation
Need to merge #1890 in for linter to pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick functional test and I'm not getting timestamp field in the logs:
% esrally list races
% cat ~/.rally/logs/rally.log
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]
2024-11-18 13:12:31,530 -not-actor-/PID:16574 esrally.rally INFO Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.utils.net INFO Connecting directly to the Internet (no proxy support) for [all_proxy].
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.rally INFO Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...
2024-11-18 13:12:31,557 -not-actor-/PID:16574 esrally.metrics INFO Creating file race store
% cat ~/.rally/logs/rally.json
{"message": "OS [uname_result(system='Darwin', node='pc-913.home', release='23.6.0', version='Darwin Kernel Version 23.6.0: Thu Sep 12 23:35:10 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6030', machine='arm64')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Python [namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Rally version [2.11.1.dev0 (git revision: 066bb7587468b4bef74b3b5978042d1dfd074d97)]", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Connecting directly to the Internet (no proxy support) for [all_proxy].", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Cleaning track dependency directory [/Users/grzegorz/.rally/libs]...", "taskName": null, "actorAddress": "-not-actor-"}
{"message": "Creating file race store", "taskName": null, "actorAddress": "-not-actor-"}
% python --version
Python 3.12.2
% pip list | grep python-json
python-json-logger 2.0.7
Let's also update https://github.com/elastic/rally/blob/master/docs/configuration.rst#logging, second paragraph. |
066bb75
to
378d078
Compare
Care to give it a try again @gbanasiak? I've added the |
32ebe9c
to
d70823e
Compare
That's better, now I'm getting this after formatting:
There's a new To make this easier to ingest by Filebeat / Elastic Agent we could change
This could be addressed by proper Filebeat configuration too. I was thinking about taking this one step further and modifying log attributes to be ECS compliant (also ECS fields in Filebeat docs). Perhaps this could be done with object translator? But I don't find obvious candidates for some of the fields:
|
We have to change |
I've used the I've also subclassed the formatter in order to move the thespian log fields into their own |
Example result:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea to use ecs-logging
. I did another round of e2e tests and found no collisions when ingesting with Filebeat.
Can we drop log.original
entirely? It's useful when documents are processed with ES ingest pipeline, but I don't think we need this (at least not yet) as message
field has exactly the same content.
I left some comments and questions.
esrally/log.py
Outdated
if log_record.get("actorAddress"): | ||
actor["address"] = log_record.pop("actorAddress") | ||
if log_record.get("taskName"): | ||
actor["task"] = log_record.pop("taskName") | ||
if actor: | ||
collections.deep_update(log_record, {"rally": {"actor": actor}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see rally.actor.task
populated ever, so I think taskName
is lost. Once this is explained we need to think about the right placement for this info. As per docs it should correspond to asyncio.Task
name so this is not related to Thespian actors. I would put it outside of rally.actor
, rally.asyncio
maybe? But then maybe not rally.actor
but rather rally.thespian
? Ugh, naming...
What's the rationale for the new collections.deep_update()
method? Couldn't we simply set rally
key in log record dict? It's a new top-level element, so I think that's safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can just drop taskName
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in version 3.12: taskName was added
Huh, turns out that's fairly new.
We can add %(taskName)s
to the format
string we pass to the Formatter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, turns out that's fairly new.
Ah, yes, most recently I've tested with 3.11.7 from esbench env. After switching to 3.12.2 and:
diff --git a/esrally/resources/logging.json b/esrally/resources/logging.json
index c1e4b6da..7651ebc5 100644
--- a/esrally/resources/logging.json
+++ b/esrally/resources/logging.json
@@ -2,12 +2,12 @@
"version": 1,
"formatters": {
"normal": {
- "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(levelname)s %(message)s",
+ "format": "%(asctime)s,%(msecs)d %(actorAddress)s/PID:%(process)d %(name)s %(taskName)s %(levelname)s %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S",
"()": "esrally.log.configure_utc_formatter"
},
"json": {
- "format": "%(message)s",
+ "format": "%(message)s %(taskName)s",
"()": "esrally.log.configure_ecs_formatter"
},
"profile": {
I can see taskName
populated in rally.log
but not in rally.json
. I think it gets swallowed in here. We could override format_to_ecs()
but I'm also fine just skipping it for now. We can revisit once we find a use for this information. What we're adding in process
object is already very useful.
Still interested in hearing the rationale behind deep_update()
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deep_update
is so that just in case we have something already there, like someone adds a future mutator, we don't clobber any dict that is already defined. Like would happen with dict.update()
Absolutely, I'm just going to drop that in |
For ease of programmatic parsing, and uploading to Kibana with filebeat
These take the log record, and add additional fields to it. This is configurable via the `mutators` property in `~/.rally/logging.json`.
5959cfb
to
14abad5
Compare
Okay! It's been a while since I remembered this, but I've added the ability to change mutators and such. The current format:
Note: |
Removed old stuff, and pulled out MutatorType to make the RallyEcsFormatter definition shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for adding this. LGTM. See remark regarding protected access.
This fixes a lint error that we got from trying to use a protected method. Thanks @gbanasiak for the catch.
For ease of programmatic parsing, and uploading to Kibana with filebeat
make check-all
successfully?