MLPerf common can be installed via pip install
by adding the following line to the requirements.txt
file:
git+https://github.com/NVIDIA/mlperf-common.git
In mlperf_logger.py
module define:
from mlperf_common.logging import MLLoggerWrapper
from mlperf_common.frameworks.pyt import PyTCommunicationHandler
mllogger = MLLoggerWrapper(PyTCommunicationHandler(), value=None)
Then use mllogger
by importing from mlperf_logger import mllogger
in other modules.
In mlperf_logger.py
global module define:
from mlperf_common.logging import MLLoggerWrapper
from mlperf_common.frameworks.mxnet import MPICommunicationHandler
mllogger = MLLoggerWrapper(MPICommunicationHandler(), value=None)
Then use mllogger
by importing from mlperf_logger import mllogger
in other modules.
Optionally, you can pass an MPI communicator during the initialization of MPICommunicationHandler()
.
comm = MPI.COMM_WORLD
mllogger = MLLoggerWrapper(MPICommunicationHandler(comm), value=None)
by default, MPICommunicationHandler()
creates a global communicator.
MLPerf logger can be used to track additional non-required metric, for example throughput
. The recommended way is to add a line such as:
mllogger.event(key='tracked_stats', metadata={'step': epoch}, value={"throughput": throughput, "metric_a": metric_a, 'metric_b': metric_b})
where throughput
is recommended to be samples per second
, logged every epoch or as often as it is reasonable for a given benchmark. Additional metrics, metric_a
and metric_b
, can represent any numerical value that requires logging. The key tracked_stats
and an increasing value for step
are required.
Instead of previous sbridge = init_bridge(rank)
, initialize sbridge as follows:
from mlperf_common.frameworks.pyt import PyTNVTXHandler, PyTCommunicationHandler
sbridge = init_bridge(PyTNVTXHandler(), PyTCommunicationHandler(), mllogger)
or, for horovod/tf/mxnet
:
from mlperf_common.frameworks.mxnet import MXNetNVTXHandler, MPICommunicationHandler
sbridge = init_bridge(MXNetNVTXHandler(), MPICommunicationHandler(), mllogger)
and start your profiling as usual
sbridge.start_prof()
sbridge.stop_prof()
Current ScaleoutBridgeBase
class replaces previous EmptyObject
class,
so just replace EmptyObject()
with ScaleoutBridgeBase()
.