Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy logging part 1 #16893

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from
Draft

Conversation

ksuderman
Copy link
Contributor

@ksuderman ksuderman commented Oct 20, 2023

[Update] The program used to generate this PR can be found at https://github.com/ksuderman/gxlogfix

This PR is a first pass at searching for, and rewriting, logging messages that do not use printf style formatting to format their output. As a pilot project I process the Python source in lib/galaxy/jobs/runners

Stat Number
Files checked 39
Files modified 19
Lines patched 156

Per the Galaxy Contributor Guidelines log messages should prefer the printf style of formatting for log messages. That is:

# Prefer
log.info("Hello world %s", name)

# Over
log.info(f"Hello {name}")
log.info("Hello %s" % name)
log.info("Hello {}".format(name))

Using printf style formatting is both faster in the case of log messages that get dropped, and safer in the case of inadvertently trying to format invalid values as the logging framework will catch and handle exceptions during string interpolation.

x = None
log.debug(f"x is {x:20}") # This will crash your program
log.debug("x is %s", x)   # This will not

The changes here were generated automatically by loading the Python source files into an AST (Abstract Syntax Tree), walking the tree looking for calls to the logging framework, and rewriting any that use f-strings, the mod operator (%) for formatting, or that call the str.format function.

Caveats and known limitations

  1. Classes that store a logger in the instance (e.g. self.log) are ignored. This only seems to happen in lib/galaxy/datatypes/registry.py

  2. The program used to rewrite the logging statements does not examine the import statements to see if a developer imports the logging module with a different name, or if they make calls to the logging framework with something other than log, logger, or logging.

  3. Uses of str.format are ignored if:

    1. the LHS is not a literal string constant, or
    2. keyword arguments are used in the substitution
  4. Any trailing comments after the logging statement will be lost.

  5. Logging statements that span multiple lines will be rewritten on a sinlge line.

  6. Strings with nested quotes are not handled.

    "This \"will\" break!"
    

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@dannon
Copy link
Member

dannon commented Oct 20, 2023

@ksuderman "make format" will run the python formatting that that check is complaining about.

@ksuderman
Copy link
Contributor Author

@ksuderman "make format" will run the python formatting that that check is complaining about.

Thanks. I thought the long lines would be problematic, but it is kind of funny that black complains about code generated by Python's own ast module. 🤣

@@ -116,7 +116,7 @@ def _init_worker_threads(self):
"""Start ``nworkers`` worker threads."""
self.work_queue = Queue()
self.work_threads = []
log.debug(f"Starting {self.nworkers} {self.runner_name} workers")
log.debug("Starting %s %s workers", self.nworkers, self.runner_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to open a giant can of worms, and I know this is only adhering to what we've put in the style guide, but... Do we really want to make this choice for performance? Is it really significant? The former is more naturally readable and I think I'd weigh that more heavily than what I assume are barely significant performance gains in debug log emission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is orders of magnitude better if log.debug messages are disabled (i.e. the log level is logging.INFO or higher). The logging framework defers formatting log message until it can't be avoided for this reason. It is also arguably safer if the string interpolation raises an exception, although that is less an issue with f-strings.

Whether it is worth it is another issue. I would advocate strongly for f-string in all other contexts but logging, then I think performance and safety outweigh the minor readability improvements. The process is automated so the cost of doing this is minimal. Having said that, if done project wide this does have the potential to modify a hundred or so files. So we (I) need to ensure the code being generated is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion on this this summer (here's the link). I see Dannon's point; however, it's an accepted best practice, and I don't think readability is significantly affected by printf syntax.

Copy link
Member

@mvdbeek mvdbeek Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is not a valid argument, logging is never a bottleneck for Galaxy. Speeding up fast code by 10X (or any factor, really) brings you nothing overall, and f-strings are certainly more readable. The addition to the styleguide was pretty contested, and the link you posted @jdavcs was the start of the discussion, see also #16312 (comment). I don't think we should touch existing code, and if we do then we should add the corresponding pylint rules and do it all in one batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed on adding this to our style guide. IMHO, there's nothing wrong with editing existing code (especially as trivial as logging messages) to adhere to our style guide. If we disagree with the style guide, I think we should change the style guide first.
If we can add pylint rules for this - great; although doing all in one batch would require significantly more upfront work, which is why I think fixing one step at a time is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @dannon I should also add that those number are for messages that are logged, nothing is going to speed those up. If the string needs to be formatted the string needs to be formatted. Any performance gains would be from messages that are not logged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't close it. I think the choice we made for the styleguide is contentious and I personally would have gone the other way, but it's really okay and I don't want to make us relitigate that here -- I just wanted to put a little context to the scale of the performance argument. You can feel safe to add as many log statements as you want and it won't make a dent, with or without fstrings :)

Again, thanks for the effort. I'll just second what Marius mentioned before -- we should also include a linting rule whether pylint, ruff, whatever, to enforce this moving forward if we're going to change it across the board, and we need to make sure this is well tested and closely reviewed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman please ping me when it's ready for review (I assume while in draft state it's not there yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to point out that there is a lot of work going into fstrings, in particular but not limited to 3.12.

https://realpython.com/python312-f-strings/
https://peps.python.org/pep-0701/

Migrating away from fstring feels wrong, especially mid-term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if we do go ahead with this we need to be 100% confident the changes are correct. With this PR I am 99% confident I can prove the generated statements are correct, or if they are not correct that the original statement was not correct either. And I just had an idea on how to test each change to show that each one produces the same output as the original statement.

@jdavcs will do. This PR was just a proof of concept intended for discussion. I didn't think it would be such a hot topic.

@bgruening that should not be an issue. Any changes to f-strings will be handled by the Python parser. The latest version of my program simply rearranges nodes in the syntax tree for logging method calls. Only a handful of the logging method calls in the Galaxy code base do anything more that simple variable substitution. If the program does encounter a statement that does fancy formatting it prints a warning, skips that statement, and moves on. And this is just for logging methods, f-strings should be preferred for all other string formatting operations.

@nsoranzo
Copy link
Member

nsoranzo commented Oct 24, 2023

@ksuderman You can configure ruff (one of the tools we used for linting the Python code) to check the logging format, just change:

diff --git a/pyproject.toml b/pyproject.toml
index f0729a813c..a1035b0a14 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -181,7 +181,9 @@ types-requests = "*"
 types-six = "*"
 
 [tool.ruff]
-select = ["E", "F", "B", "UP"]
+# Enable: pycodestyle errors (E), Pyflakes (F), flake8-bugbear (B),
+# flake8-logging-format (G) and pyupgrade (UP)
+select = ["E", "F", "B", "G", "UP"]
 target-version = "py37"
 # Exceptions:
 # B008 Do not perform function calls in argument defaults (for FastAPI Depends and Body)

This will prevent regressions once this is finished.

@@ -908,7 +908,7 @@ def stop_job(self, job_wrapper):
self.__cleanup_k8s_job(k8s_job)
# TODO assert whether job parallelism == 0
# assert not job_to_delete.exists(), "Could not delete job,"+job.job_runner_external_id+" it still exists"
log.debug(f"({job.id}/{job.job_runner_external_id}) Terminated at user's request")
log.debug("(%s/%s) Terminated at user's request", job.id, job.job_runner_external_id)

except Exception as e:
log.exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log.exception() call also uses str.format(), but for some reason it hasn't been rewritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an oversight and is fixed in ksuderman/gxlogfix#2

@nsoranzo
Copy link
Member

@ksuderman You can configure ruff (one of the tools we used for linting the Python code) to check the logging format, just change:
...
This will prevent regressions once this is finished.

I have opened #16915 to enable this and fix the errors for some hopefully uncontroversial rules. G001 and G002 are also likely uncontroversial, only G004 ("Logging statement uses f-string") is causing concerns.

I was thinking it would be nice to have the fixers for the G rules in ruff (only G010 currently has one) but not sure how much of gxlogfix can be reused for that since ruff is written in Rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants