Skip to content

Commit

Permalink
[SPARK-50427][CONNECT][PYTHON] Expose configure_logging as a public API
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Expose `configure_logging` as a public API that can be used
to configure the log level for Pyspark connect component.

### Why are the changes needed?

We currently offer the mechanism to configure the connect-specific logger
based on the environment variable `SPARK_CONNECT_LOG_LEVEL`.

The logger is configured once at the the time of "module load". In some cases,
Python frameworks (eg. IPythonKernel) can modify the Python log level after the
fact leading to unintended log output.

There is no good way to restore the logger to restore its previous functionality
to honor the environment variable configured. 

### Does this PR introduce _any_ user-facing change?

Yes.

Provide a new API `configure_logging` in the module
`pyspark.sql.connect.logging`.

### How was this patch tested?

Local testing by calling `configure_logging` with different log levels.

Further tested with IPythonKernel instance which changes the log level
and confirmed that calling this API during app startup fixes it back to the
correct log level.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48802 from nija-at/expose-log-method.

Authored-by: Niranjan Jayakar <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
nija-at authored and HyukjinKwon committed Nov 26, 2024
1 parent 02bfce6 commit 4ec9ebf
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions python/pyspark/sql/connect/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,30 +21,36 @@
import os
from typing import Optional

__all__ = [
"getLogLevel",
]
__all__ = ["configureLogging", "getLogLevel"]


def _configure_logging() -> logging.Logger:
"""Configure logging for the Spark Connect clients."""
def configureLogging(level: Optional[str] = None) -> logging.Logger:
"""
Configure log level for Spark Connect components.
When not specified as a parameter, log level will be configured based on
the SPARK_CONNECT_LOG_LEVEL environment variable.
When both are absent, logging is disabled.
.. versionadded:: 4.0.0
"""
logger = PySparkLogger.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(
logging.Formatter(fmt="%(asctime)s %(process)d %(levelname)s %(funcName)s %(message)s")
)
logger.addHandler(handler)

# Check the environment variables for log levels:
if "SPARK_CONNECT_LOG_LEVEL" in os.environ:
if level is not None:
logger.setLevel(level.upper())
elif "SPARK_CONNECT_LOG_LEVEL" in os.environ:
logger.setLevel(os.environ["SPARK_CONNECT_LOG_LEVEL"].upper())
else:
logger.disabled = True
return logger


# Instantiate the logger based on the environment configuration.
logger = _configure_logging()
logger = configureLogging()


def getLogLevel() -> Optional[int]:
Expand Down

0 comments on commit 4ec9ebf

Please sign in to comment.