Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove hardcoded logging config for pdgstaging #47

Open
3 tasks
mbjones opened this issue Mar 26, 2024 · 5 comments
Open
3 tasks

remove hardcoded logging config for pdgstaging #47

mbjones opened this issue Mar 26, 2024 · 5 comments
Labels
bug Something isn't working pdg

Comments

@mbjones
Copy link
Member

mbjones commented Mar 26, 2024

The current logging configuration in logging_config.py hardcodes some logging configuration, which causes difficult errors in some deployment scenarios. For example, because a logging FileHandler is hardcoded to use the file /tmp/log.log, if multiple users try to use the library on the same machine, they get permissions errors because they can't create the needed logging file on import.

In general, packages and library modules should not configure logging, rather deferring to the main module to configure logging in a way that works for its deployment scenario. This is described in the logging tutorial here: https://docs.python.org/2/howto/logging.html#logging-from-multiple-modules

The only thing that a typical package or module should configure for logging is to set a name for its logger, which by convention is set to the package or module __name__, which allows the logs from each module to be distinguished and controlled independently. This is typically done for the package in the __init__.py file with a statement like:

logger = logging.getLogger(__name__)

Module level loggers should use the same code, but are typicially initialized at the top of the module. See the tutorial for details: https://docs.python.org/2/howto/logging.html#advanced-logging-tutorial

In our case, I think a simple logging scenario would be to:

  • delete pdgstaging/logging_config.py
  • add logger = logging.getLogger(__name__) to the package __init__.py
  • Leave the rest of the logging configuration that is currently in logging_config.py up to the calling application (e.g., viz-workflow)

I will file an identical issue for the viz-raster package, which has the same issues.

For an overview of one approach, see for example: https://stackoverflow.com/a/50751987

@mbjones mbjones added bug Something isn't working pdg labels Mar 26, 2024
@mbjones mbjones moved this to Backlog in Visualization Workflow Mar 26, 2024
@julietcohen
Copy link
Collaborator

julietcohen commented May 21, 2024

@mbjones Your third suggested step is "Leave the rest of the logging configuration that is currently in logging_config.py up to the calling application (e.g., viz-workflow)".

The way the visualization workflow currently executes (regardless of whether you are using parsl or ray) is by running a workflow script with a command like python parsl_workflow.py. That workflow script pulls in a viz config and a parsl config if you're using parsl. For example, see the parsl workflow script within the kubernetes & parsl branch of viz-workflow here.

Where do you suggest that the logging configuration that is currently in logging_config.py be specified within viz-workflow so that the logging configuration is passed onto the workflow script? In the __init__.py? Then in the parsl or ray workflow script, pdg_workflow would need to be imported as well?

@mbjones
Copy link
Member Author

mbjones commented May 22, 2024

To the extent that viz-workflow is a library package, it should configure logging the same way as I described for viz-staging and viz-raster. That said, viz-workflow also contains a python script that runs the __main__ code block for the workflow. It is in __main__ that logging should be configured. I think the best way to do that is to externalize the logging configuration to a config file that gets passed in via a commandline parameter to the __main__ method (typically with some overridable defaults). The logging documentation shows how to use fileConfig() to load a logging configuration file as part of your main() startup method. Overview of the config file is here: https://docs.python.org/3/library/logging.config.html#configuration-file-format

One of the issues we should tackle is separating out the viz-workflowlibrary package from the specific invocation script you are using to launch a specific dataset run (unless the script can be generic enough to support all such runs). We'll really need this separation of concerns when we build out a service backed by the package.

@julietcohen
Copy link
Collaborator

The modules in pdgstaging and pdgraster still need to define logger in order to execute all the logger.info() lines throughout the code. At the top of the modules, I had included the following code for the established logging approach:

import logging
from . import logging_config
logger = logging_config.logger

So I will have to define the logger differently since logging_config.py is to be removed from pdgstaging and pdgraster

@julietcohen
Copy link
Collaborator

Branches for these changes, named for the issue number in each repo:
viz-workflow: bug-38-log
viz-staging: bug-47-log
viz-raster: bug-27-log

@mbjones
Copy link
Member Author

mbjones commented May 22, 2024

At the top of the modules, I had included the following code for the established logging approach:

I was suggesting that a common approach to this is to move this log initialization to __init__.py, so that is run during package and module load and doesn't have to be run in each file. Here's what I think probably goes there in __init__.py:

import logging
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())

The only downside to that is that all of the logs are addressed by the package name, so if you want to control logs at the module level, then you would want to initialize the logger (in the same way as described) at the top of each module (still using __name__), which will differ for the module compared to the package. That would allow module-specific log configuration, which can be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pdg
Projects
Status: Backlog
Development

No branches or pull requests

2 participants