-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy logging part 1 #16893
base: dev
Are you sure you want to change the base?
Lazy logging part 1 #16893
Conversation
@ksuderman "make format" will run the python formatting that that check is complaining about. |
Thanks. I thought the long lines would be problematic, but it is kind of funny that |
@@ -116,7 +116,7 @@ def _init_worker_threads(self): | |||
"""Start ``nworkers`` worker threads.""" | |||
self.work_queue = Queue() | |||
self.work_threads = [] | |||
log.debug(f"Starting {self.nworkers} {self.runner_name} workers") | |||
log.debug("Starting %s %s workers", self.nworkers, self.runner_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to open a giant can of worms, and I know this is only adhering to what we've put in the style guide, but... Do we really want to make this choice for performance? Is it really significant? The former is more naturally readable and I think I'd weigh that more heavily than what I assume are barely significant performance gains in debug log emission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is orders of magnitude better if log.debug
messages are disabled (i.e. the log level is logging.INFO
or higher). The logging framework defers formatting log message until it can't be avoided for this reason. It is also arguably safer if the string interpolation raises an exception, although that is less an issue with f-strings.
Whether it is worth it is another issue. I would advocate strongly for f-string in all other contexts but logging, then I think performance and safety outweigh the minor readability improvements. The process is automated so the cost of doing this is minimal. Having said that, if done project wide this does have the potential to modify a hundred or so files. So we (I) need to ensure the code being generated is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a discussion on this this summer (here's the link). I see Dannon's point; however, it's an accepted best practice, and I don't think readability is significantly affected by printf syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is not a valid argument, logging is never a bottleneck for Galaxy. Speeding up fast code by 10X (or any factor, really) brings you nothing overall, and f-strings are certainly more readable. The addition to the styleguide was pretty contested, and the link you posted @jdavcs was the start of the discussion, see also #16312 (comment). I don't think we should touch existing code, and if we do then we should add the corresponding pylint rules and do it all in one batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed on adding this to our style guide. IMHO, there's nothing wrong with editing existing code (especially as trivial as logging messages) to adhere to our style guide. If we disagree with the style guide, I think we should change the style guide first.
If we can add pylint rules for this - great; although doing all in one batch would require significantly more upfront work, which is why I think fixing one step at a time is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And @dannon I should also add that those number are for messages that are logged, nothing is going to speed those up. If the string needs to be formatted the string needs to be formatted. Any performance gains would be from messages that are not logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't close it. I think the choice we made for the styleguide is contentious and I personally would have gone the other way, but it's really okay and I don't want to make us relitigate that here -- I just wanted to put a little context to the scale of the performance argument. You can feel safe to add as many log statements as you want and it won't make a dent, with or without fstrings :)
Again, thanks for the effort. I'll just second what Marius mentioned before -- we should also include a linting rule whether pylint, ruff, whatever, to enforce this moving forward if we're going to change it across the board, and we need to make sure this is well tested and closely reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ksuderman please ping me when it's ready for review (I assume while in draft state it's not there yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to point out that there is a lot of work going into fstrings, in particular but not limited to 3.12.
https://realpython.com/python312-f-strings/
https://peps.python.org/pep-0701/
Migrating away from fstring feels wrong, especially mid-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we do go ahead with this we need to be 100% confident the changes are correct. With this PR I am 99% confident I can prove the generated statements are correct, or if they are not correct that the original statement was not correct either. And I just had an idea on how to test each change to show that each one produces the same output as the original statement.
@jdavcs will do. This PR was just a proof of concept intended for discussion. I didn't think it would be such a hot topic.
@bgruening that should not be an issue. Any changes to f-strings will be handled by the Python parser. The latest version of my program simply rearranges nodes in the syntax tree for logging method calls. Only a handful of the logging method calls in the Galaxy code base do anything more that simple variable substitution. If the program does encounter a statement that does fancy formatting it prints a warning, skips that statement, and moves on. And this is just for logging methods, f-strings should be preferred for all other string formatting operations.
@ksuderman You can configure diff --git a/pyproject.toml b/pyproject.toml
index f0729a813c..a1035b0a14 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -181,7 +181,9 @@ types-requests = "*"
types-six = "*"
[tool.ruff]
-select = ["E", "F", "B", "UP"]
+# Enable: pycodestyle errors (E), Pyflakes (F), flake8-bugbear (B),
+# flake8-logging-format (G) and pyupgrade (UP)
+select = ["E", "F", "B", "G", "UP"]
target-version = "py37"
# Exceptions:
# B008 Do not perform function calls in argument defaults (for FastAPI Depends and Body) This will prevent regressions once this is finished. |
@@ -908,7 +908,7 @@ def stop_job(self, job_wrapper): | |||
self.__cleanup_k8s_job(k8s_job) | |||
# TODO assert whether job parallelism == 0 | |||
# assert not job_to_delete.exists(), "Could not delete job,"+job.job_runner_external_id+" it still exists" | |||
log.debug(f"({job.id}/{job.job_runner_external_id}) Terminated at user's request") | |||
log.debug("(%s/%s) Terminated at user's request", job.id, job.job_runner_external_id) | |||
|
|||
except Exception as e: | |||
log.exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log.exception()
call also uses str.format()
, but for some reason it hasn't been rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an oversight and is fixed in ksuderman/gxlogfix#2
I have opened #16915 to enable this and fix the errors for some hopefully uncontroversial rules. G001 and G002 are also likely uncontroversial, only G004 ("Logging statement uses f-string") is causing concerns. I was thinking it would be nice to have the fixers for the G rules in ruff (only G010 currently has one) but not sure how much of gxlogfix can be reused for that since ruff is written in Rust. |
[Update] The program used to generate this PR can be found at https://github.com/ksuderman/gxlogfix
This PR is a first pass at searching for, and rewriting, logging messages that do not use
printf
style formatting to format their output. As a pilot project I process the Python source inlib/galaxy/jobs/runners
Per the Galaxy Contributor Guidelines log messages should prefer the
printf
style of formatting for log messages. That is:Using
printf
style formatting is both faster in the case of log messages that get dropped, and safer in the case of inadvertently trying to format invalid values as the logging framework will catch and handle exceptions during string interpolation.The changes here were generated automatically by loading the Python source files into an AST (Abstract Syntax Tree), walking the tree looking for calls to the logging framework, and rewriting any that use f-strings, the mod operator (
%
) for formatting, or that call thestr.format
function.Caveats and known limitations
Classes that store a logger in the instance (e.g. self.log) are ignored. This only seems to happen in
lib/galaxy/datatypes/registry.py
The program used to rewrite the logging statements does not examine the
import
statements to see if a developer imports thelogging
module with a different name, or if they make calls to the logging framework with something other thanlog
,logger
, orlogging
.Uses of
str.format
are ignored if:Any trailing comments after the logging statement will be lost.
Logging statements that span multiple lines will be rewritten on a sinlge line.
Strings with nested quotes are not handled.
How to test the changes?
(Select all options that apply)
License