Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git.refresh warns to stdout and does not log #1808

Closed
EliahKagan opened this issue Jan 24, 2024 · 11 comments · Fixed by #1815
Closed

Git.refresh warns to stdout and does not log #1808

EliahKagan opened this issue Jan 24, 2024 · 11 comments · Fixed by #1815

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Jan 24, 2024

Warning messages are most often written to standard error. Furthermore, warnings from GitPython, when they are displayed, are in most cases displayed that way unless the caller arranges otherwise, because they are displayed by calling loggers' warning methods or, for DeprecationWarnings, by calling warnings.warn.

However, the warning Git.refresh displays when running a test command like git --version (with the command being tested used for git) fails, and GIT_PYTHON_REFRESH is set to 1, w, warn, or warning, is different. It is printed to standard output. It also does not use any logging facility, so attempts to capture it by configuring Python logging will not succeed.

GitPython/git/cmd.py

Lines 453 to 454 in d28c20b

if mode in warn:
print("WARNING: %s" % err)

If there is a reason for this behavior, then I think it should be commented or otherwise stated. Otherwise, I am not sure what to do. It may still not be feasible to change, because programs that use GitPython may be relying on the warning being printed to standard output, or on it not being printed to standard error. But if he behavior is going to change in the next major version, then that could be commented and/or this issue could be kept open until then. If the reason is unknown and it is undecided whether the behavior should be changed in the next major release, then a comment could be added just saying that it prints to standard output for backward compatibility.

@Byron
Copy link
Member

Byron commented Jan 24, 2024

Thanks for bringing this up!

To my mind, libraries have no business with the stdout/stderr of a program, which makes this usage (and those like it) as bug. If other programs rely on the output of a library, they rely on an unstable API without any guarantees.

This is an opening for using the logging facility here instead.

Maybe in relation to #1806, if the logging would be configured to show warnings by default as it is suggested to well-behaved users of the logging module, then this important message should still show up even if GitPython is only used in an ad-hoc fashion. I am sure this was also the reason it prints to stdout, it was such a common issue that it should be assured the users can see it even without any prior configuration to the logging module.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Jan 24, 2024

This is an opening for using the logging facility here instead.

I agree.

Maybe in relation to #1806, if the logging would be configured to show warnings by default as it is suggested [...]

I think the change for that should just be to no longer prevent that from happening, by no longer registering any NullHandler, since doing so is no longer needed. Whether it happens or not would then depend on whether, and how, logging has already been configured at the time the git module is imported. If messages of level WARNING are for some reason being ignored, such as due to configuration of the root logger (higher than any GitPython loggers), then it would not be seen. I don't think such suppression is common, but logging might be configured in a way that doesn't show messages in a terminal. So long as that's okay, I think this is a good change to make.

There is also the question of what logging level should be used for this message. It's described as a warning, currently, but if it really represents an error then it can be logged with the ERROR level. The description in the logging levels table for ERROR is, "Due to a more serious problem, the software has not been able to perform some function." Intuitively, that seems right. But I'm not sure it really is, because one way to characterize the situation at that point would be to say that the software hasn't tried and failed to do anything it needs to do; instead, it's tried something it doesn't need to do yet, and the failure is only a prediction of something that might be an error later. Then again, this is a condition that is treated as an error by default and an exception raised, so maybe the ERROR level is the right choice. Then again, the warning is when an environment variable has been set to not treat it as an error...

I am sure this was also the reason it prints to stdout, it was such a common issue that it should be assured the users can see it even without any prior configuration to the logging module.

I don't see why that would be, since writing to stderr has never required that logging be configured. It's true that doing so with print varied between Python 2 and Python 3, both of which GitPython supported at the time this code was added in a962464 (#640), if from __future__ import print_function was not used:

print >> sys.stderr, "WARNING: %s" % err     # Old way (Python 2).
print("WARNING: %s" % err, file=sys.stderr)  # New way (Python 3, or either with a __future__ import).

I don't know if that's sufficient to explain why it wasn't done, though.

@Byron
Copy link
Member

Byron commented Jan 25, 2024

I think the change for that should just be to no longer prevent that from happening, by no longer registering any NullHandler, since doing so is no longer needed. Whether it happens or not would then depend on whether, and how, logging has already been configured at the time the git module is imported. If messages of level WARNING are for some reason being ignored, such as due to configuration of the root logger (higher than any GitPython loggers), then it would not be seen. I don't think such suppression is common, but logging might be configured in a way that doesn't show messages in a terminal. So long as that's okay, I think this is a good change to make.

Yes, that's absolutely OK and I would appreciate that. If folks override the logger in their application, they know what they are doing. Those who use python -m git would still see these warnings by default, which is what I would hope for.

There is also the question of what logging level should be used for this message. It's described as a warning, currently, but if it really represents an error then it can be logged with the ERROR level.

I agree, and this would make it even more likely to actually show up even if the logging is configured at application level. Since this is configurable, maybe that means the message level should be adjusted depending on the GitPython configuration.

I don't see why that would be, since writing to stderr has never required that logging be configured.

Apologies for the confusion. What I tried to say is that in order to avoid that the message might not show up as logging is disabled, it was emitted to stdout or stderr directly. Maybe it's just an oversight, the true reason might be lost in the tides of time.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Jan 26, 2024

Yes, that's absolutely OK and I would appreciate that. If folks override the logger in their application, they know what they are doing.

I agree. Common approaches when not doing anything special with logging are to not configure logging or to configure it with logging.basicConfig. Either way, if the level is not set explicitly anywhere and nothing is done to suppress messages, WARNING and higher are shown.

(The messages are formatted more nicely when logging is configured, including with basicConfig, than when logging is not configured and the handler of last resort ends up displaying messages. But either way, these days, when nothing is done to suppress messages or control what severities are seen, it is WARNING and higher.)

Those who use python -m git would still see these warnings by default, which is what I would hope for.

Do you just mean importing and using the git module? git is a package (it's a directory) and it does not contain __main__.py, so running it as a script, as python -m does, gives an error that it can't be directly executed.

I ask because one nit I plan to fix at the same time as some of this other logging-related stuff is to make logging.getLogger calls consistent across modules in GitPython where inconsistency appears unintentional. Currently, most pass their fully qualified module name by writing __name__, but a few write it out, e.g., git/objects/commit.py contains logging.getLogger("git.objects.commit") instead of logging.getLogger(__name__). These work the same, with __name__ being preferred, so long as the modules are imported rather than being run as scripts. This is not directly affected by whether the top-level git module is runnable as a script, but if any modules inside the git/ directory are intended to be runnable as scripts, then I should be aware of that while making that change to use __name__.

[...] but if it really represents an error then it can be logged with the ERROR level.

I agree, and this would make it even more likely to actually show up even if the logging is configured at application level.

Some further considerations:

  • If a user or application developer adds a NullHandler to the git logger to restore the old behavior that will be changed for NullHandler added to all but git.repo.base logger #1806, this will also prevent messages (of any severity) logged to the git or git.cmd logger. So a message from Git.refresh, of any severity, would also be suppressed. Thus, by trying to restore one old behavior, a user/developer could end up making another behavior even more different. Then again, suppressing logging probably should suppress such messages. (It would not be suppressed if logged to the root logger instead, but a library should not log to the root logger.)

  • GitPython currently logs some messages of severity ERROR. Even a user or application developer takes the approach of configuring logging but setting the logging level (i.e., minimum severity) higher for GitPython loggers, there would still be no easy way to suppress all messages except the one from Git.refresh. If the message from Git.refresh were changed to CRITICAL then that would provide an easy way to see only it, since GitPython does not currently use that level for any messages. But I think that may not be justified, especially considering that the git module may end up being imported by an application or other library that rarely uses its functionality or that uses it in a fairly unimportant way.

  • Currently the recognized values of GIT_PYTHON_REFRESH, which are included in messages, characterize the message as a warning:

    warn = ["warn", "w", "warning", "1"]

    I think it is fine to add new values, including synonyms of existing values, as well as to change the wording of the message, including how it describes the options. But I think all those values should remain supported. It seems to me that the way they are presented effectively documents them and it would be a breaking change to stop recognizing them. I also think it is valuable to be able to set environment variables in a way that causes any version of GitPython (excluding very old versions) to be configured in the same way. Because values like warn should continue to be recognized, this means that users may be surprised if the level is higher. I'm not sure how much of a problem that really is.

  • Should the level of the message itself be configurable with an additional, newly recognized GIT_PYTHON_* environment variable? This could mitigate some of the above issues somewhat. My feeling is that the benefit would not be worth the added complexity, and that discoverability would be low unless the message mentions that environment variable too, which it shouldn't since then the message would be overcomplicated and confusing. But I am not sure.

Apologies for the confusion. What I tried to say is that in order to avoid that the message might not show up as logging is disabled, it was emitted to stdout or stderr directly. Maybe it's just an oversight, the true reason might be lost in the tides of time.

Ah, I see. The two of us were focusing on different things there. I was specifically worried about why it was going to standard output rather than standard error, in case that behavior had at some point been needed.

Edit: Fixed a misspelling of GIT_PYTHON_REFRESH, which I had written in one place as GIT_REFRESH.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Jan 26, 2024

Since this is configurable, maybe that means the message level should be adjusted depending on the GitPython configuration.

Can you elaborate on this?

(I've posted this as its own comment since it seems like the issue may be especially important, and I don't want you to feel pressured to respond to my various thoughts above, or to do so at the same time, if this is the priority.)

@Byron
Copy link
Member

Byron commented Jan 26, 2024

Those who use python -m git would still see these warnings by default, which is what I would hope for.

Do you just mean importing and using the git module? git is a package (it's a directory) and it does not contain __main__.py, so running it as a script, as python -m does, gives an error that it can't be directly executed.

I ask because one nit I plan to fix at the same time as some of this other logging-related stuff is to make logging.getLogger calls consistent across modules in GitPython where inconsistency appears unintentional. Currently, most pass their fully qualified module name by writing __name__, but a few write it out, e.g., git/objects/commit.py contains logging.getLogger("git.objects.commit") instead of logging.getLogger(__name__). These work the same, with __name__ being preferred, so long as the modules are imported rather than being run as scripts. This is not directly affected by whether the top-level git module is runnable as a script, but if any modules inside the git/ directory are intended to be runnable as scripts, then I should be aware of that while making that change to use __name__.

I used this example as I know this imports git (GitPython) and runs its initialization, which should include checking for a git binary, but maybe I am mistaken. It was my hope that doing so would print a message, like it might have done before.

  • GitPython currently logs some messages of severity ERROR. Even a user or application developer takes the approach of configuring logging but setting the logging level (i.e., minimum severity) higher for GitPython loggers, there would still be no easy way to suppress all messages except the one from Git.refresh. If the message from Git.refresh were changed to CRITICAL then that would provide an easy way to see only it, since GitPython does not currently use that level for any messages. But I think that may not be justified, especially considering that the git module may end up being imported by an application or other library that rarely uses its functionality or that uses it in a fairly unimportant way.

If anything in the git module is usable without a known to be working Git executable, then leaving the severity with ERROR might be OK. If meaningful operation isn't required, it could probably even be CRITICAL. Maybe this can be going along with the severity configuration that already is supported, by adding another level to make that configurable.

  • I think it is fine to add new values, including synonyms of existing values, as well as to change the wording of the message, including how it describes the options. But I think all those values should remain supported. It seems to me that the way they are presented effectively documents them and it would be a breaking change to stop recognizing them. I also think it is valuable to be able to set environment variables in a way that causes any version of GitPython (excluding very old versions) to be configured in the same way. Because values like warn should continue to be recognized, this means that users may be surprised if the level is higher. I'm not sure how much of a problem that really is.

I agree that this configuration should keep working, and maybe I am wrong to assume that this can integrate nicely with logging severities.

  • Should the level of the message itself be configurable with an additional, newly recognized GIT_PYTHON_* environment variable? This could mitigate some of the above issues somewhat. My feeling is that the benefit would not be worth the added complexity, and that discoverability would be low unless the message mentions that environment variable too, which it shouldn't since then the message would be overcomplicated and confusing. But I am not sure.

I see, each message comes with a severity, and we would be talking about increasing the severity of this message in particular, which is hard-coded. Indeed, I wouldn't want this to get anymore complicated and don't think users should get one more way to configure this.

Can you elaborate on this?

I hope this is explained above, and also think that once the messaging of git.refresh and friends is using logging after #1813 is merged, it will be alright (or can be made to be alright).

@EliahKagan
Copy link
Contributor Author

I used this example as I know this imports git (GitPython) and runs its initialization, which should include checking for a git binary, but maybe I am mistaken. It was my hope that doing so would print a message, like it might have done before.

You are correct and I was simply mistaken here. My mind was already so much on the issue of what modules if any in GitPython might be runnable as a script (relevant to 987dbf4), I didn't think to check my belief that the check for __main__.py precedes running the top-level module code of __init__.py. That belief of mine turns out to be false (and also didn't really make sense, because almost everything in Python is dynamic, and if __init__.py didn't run first then it wouldn't be able to patch in __main__ or customize the import system before __main__ is searched, in the rare case that one would want to do this). Running python -m git is indeed an effective way to exercise the first-refresh behavior of GitPython; it and python -c 'import git' are equally good. Sorry about the confusion!

I'll try to reply to the rest soon, but I wanted to get this out since it addresses an inaccuracy in what I had said above.

@EliahKagan
Copy link
Contributor Author

If anything in the git module is usable without a known to be working Git executable, then leaving the severity with ERROR might be OK. If meaningful operation isn't required, it could probably even be CRITICAL.

If no new values of GIT_PYTHON_REFRESH are added and no additional environment variable is consulted, then I think the question is whether the level should reflect the fact that GIT_PYTHON_REFRESH was set to not raise an exception and take that to mean the situation is less severe. The git.refresh function itself is usable when git is not found, after all.

I think on reflection that it's at least an ERROR because, unlike with a WARNING, GitPython may not be expected to continue working. I'm not sure about CRITICAL. But if GIT_PYTHON_REFRESH is to be understood as indicating only how the problem should be reported, rather than as indicating how severe the problem is, then CRITICAL may make sense.

Maybe this can be going along with the severity configuration that already is supported, by adding another level to make that configurable.

The reason I'm not sure how well they would work is that the current recognized options that can be set via GIT_PYTHON_REFRESH have both symbolic and numerical values, and logging levels have numerical as well as symbolic values, and the numerical values are different; they are small numbers for GIT_PYTHON_REFRESH currently (which should continue to be supported) and numbers like 10, 20, and so forth, for logging levels.

One way to get around that is to add another environment variable. But then we have another environment variable, have to decide what to do when it is unset as well as whether or not and how to report when its values makes no sense, and complexity for the user may be increased if this information is all presented in a message.

I see, each message comes with a severity, and we would be talking about increasing the severity of this message in particular, which is hard-coded.

Other than the above issue with how it would be specified, I don't think the level/severity needs to be hard-coded. The log method can be used with a dynamically determined level; I don't think there's anything inherently wrong with doing that.

@Byron
Copy link
Member

Byron commented Jan 28, 2024

Maybe you have an idea or preference on how this issue can be resolved by using a logger to report the matter in the most suitable way. I'd love to go with that :).

@EliahKagan
Copy link
Contributor Author

When working on #1813, I wrote dc62096, which I didn't include there because, as noted there, I think it should have tests, which should build on the test changes in #1812. Both #1812 and #1813 are merged (and they will be able to run on Cygwin, since #1814), so those tests can be written and proposed.

So I am thinking I'll go with an approach like in dc62096 initially, but with tests, and possibly with ERROR changed to CRITICAL depending on the outcome of a quick-ish search for how projects tend to decide between the two levels in ambiguous circumstances.

Then if a release ends up happening between that and future changes to this feature, future changes might be constrained by the need to avoid breakage or confusion. But even then, it could still probably be built on further.

Delays in implementing this have not been due to design questions here, but instead the CI problems discovered in #1813 and worked around in #1814. I may still not get to this immediately, mostly for reasons not related (not even indirectly) to GitPython, but in part because I agree with the idea that the cause and exact conditions of the problem worked around in #1814 should be further investigated.


With all that said, I do have another idea for how the logging level could be specified. I don't know if it's really a good idea, and it's not part of my initial plan as described above.

                   1, w, warn, warning, or log, before a colon, to log a message
                   |
GIT_PYTHON_REFRESH=1:40
                     |
                     logging level, which can be numeric or symbolic, after the colon

The : followed by a logging level would always be optional. It would be ignored unless the part before the : is 1, w, etc.

The case that could be confusing is:

                   we log
                   |
GIT_PYTHON_REFRESH=1:error
                     |
                     at level 40 (ERROR)
                   we raise an exception
                   |
GIT_PYTHON_REFRESH=error:1
                         |
                         ignored (but if we were logging, would be very-low user-defined level 1)

@Byron
Copy link
Member

Byron commented Jan 29, 2024

Thanks for sharing - I am already looking forward to what might cause these hangs deeply under the hood of C-Python apparently.


Event though I think this could work while being backward compatible, some UX issues notwithstanding, my question is if there could ever be user-side demand for this. Maybe the demand appears with awareness though, which can be provided of course as part of the printed messages. In any case, I am open to that for when the time comes.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Feb 2, 2024
This adds tests of the initial refresh that is attempted
automatically on import. All the refresh tests prior to this point
test subsequent refreshes. Those tests are kept, and new ones are
added that simulate the condition of not having yet done the
initial refresh by setting Git.GIT_PYTHON_GIT_EXECUTABLE to None.

Some current behavior these tests assert may change for gitpython-developers#1808.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Feb 2, 2024
This is instead of the current behavior writing the message to
stdout.

This commit does not change the behavior of the code under test,
but it changes tests to assert the following:

- "Bad git executable" messages are logged, at level CRITICAL.
- "log" (and "l") is recognized as another synonym of "warn".
- "silent" is recognized as a synonym of "quiet" (as "silence" is).

Although it is ambiguous whether this should be logged at level
ERROR or CRITICAL, because git.refresh is still expected to be
usable and can be called manually, not having a working git is a
condition in which GitPython, and any program that really relies on
its functionality, should be expected not work. That is the general
rationale for using CRIICAL here. There are also two specific
reasons:

- Existing messages GitPython logs as ERROR typically represent
  errors in individual operations on repositories, which could fail
  without indicating that GitPython's overall functionality is in
  an unusable state. Using the higher CRITICAL level for this
  situation makes sense for contrast.

- Prior to gitpython-developers#1813, logging messsges emitted from GitPython modules,
  no matter the level, were suppressed when logging was not
  configured, but because this message was printed instead of
  being logged, it was still shown. Now that it is to be logged,
  there may be a benefit to have an easy way for someone to bring
  back a close approximation of the old behavior. Having this
  message be at a higher logging level makes that easier to do.
  (This is a less important reason, as that should rarely be done.)

test_initial_refresh_from_bad_git_path_env_warn is the main changed
test. All tests should pass again once code is changed for gitpython-developers#1808.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants