- 1. Preamble
- 2. Overview
- 3. Power Measurement BKM
- 4. Power Measurement Flow
- 5. Rules
- 5.1. System should be consistent with platform descriptions
- 5.2. Power Delivery for Measurements
- 5.3. Platform Description for Power submission
- 5.4. SUT and Power Analyzer to remain unchanged from submission to results
- 5.5. Changes to Loadgen
- 5.6. System and framework must be available
- 5.7. Power is measured in performance phase of the Benchmark/Scenario
- 5.8. Audit process and Benchmark Logging
- 5.9. Replicability is mandatory
- 5.10. Power and Performance Reporting and Measurement
- 6. Power Analyzer Support
- 7. Power Analyzer settings
- 8. Power Management
- 9. Power Logs
- 10. Loadgen
- 11. FAQs
Version 1.1 Updated Aug 06, 2021.
Points of contact: Sachin Idgunji ([email protected]), Arun Tejusve Raghunath Rajan ([email protected]).
The MLPerf Power rules are an extension of the MLPerf Inference rules. This document only outlines the deltas between what are the added steps to be considered for power submission along with the original performance submissions.
This document describes how to measure power for the benchmarks in the MLPerf Inference Suite.
There are separate rules for the submission, review, and publication process for all MLPerf benchmarks in the MLPerf Inference rules referenced above.
The MLPerf name and logo are trademarks. In order to refer to a result using the MLPerf name, the result must conform to the letter and spirit of the rules specified in this document. The MLPerf organization reserves the right to solely determine if a use of its name or logo is acceptable.
A system under test (SUT) consists of a defined set of hardware and software resources that will be measured for performance. The hardware resources may include processors, accelerators, memories, disks, and interconnect. The software resources may include an operating system, compilers, libraries, and drivers that significantly influences the running time of a benchmark.
A Director/Server system is any system that can act as the host for the power measurements, e.g., a standard PC with an interface that connects to the power analyzer
A power analyzer (aka power meter) is a device that is used for measuring the wall power of a system. The SUT is powered using the power analyzer and the power, voltage and current drawn by the SUT are measured by the power analyzer. The other end of the power analyzer is connected to the director system that uses a software for logging the collected data.
NTP stands for Network Time Protocol and it ensures time synchronization between the different systems that are used for power collection. It is required that the SUT and the Director are synchronized with NTP.
PTD (Power Thermal Utility) is a utility that is developed by SPEC® for the purposes of power measurements and logging. MLCommons has licensed the use of this utility from SPEC. Please ensure to adhere to all things related to MLC and SPEC NDA. PTD by itself cannot be distributed.
The power measurement BKM (Best Known Method) is documented here. This document also contains the block diagram for the power measurement process. This setup will need to be followed for all power submissions.
The power measurement flow detailed below : here. This is part of the power automation tool that is found here. Only these will need to be used if submitting for power. Any other means of power measurement submission will not be allowed. This is because the power measurement flow incorporates within itself a "Dynamic Ranging" method that ensures the power measurements being done are at high precision and accuracy. The flow also ensures that only the performance section of the benchmark is being measured. For power measurement, this flow does not apply to disaggregated systems in this version of the submission.
All rules for performance are applicable for power. In addition:
The system used for power measurements should match the ones described in the platform descriptions as part of the submission. Power needs to be measured for the Inference submission system(s) that includes all components that are sensitized by loadgen executing including the processor on which loadgen runs.
MLPerf Power for Inference measures wall AC power consumed. The expectation is that the power delivery to the unit of computation (node) must not have any battery or alternate power storage in conjunction to the AC power , before the power is provided to the PSUs of the system.
A valid submission has to have all the fields of the platform description deemed mandatory to be filled by the submitter. Power is measured at the system level, and so a detailed description of the system/platform has to be provided as has been agreed.
For each submission the Power Management settings for the system to achieve the measured performance and power for the submissions must be documented. This is needed for reproducibility and verifiability
An example file is here : https://docs.google.com/document/d/1-j7TWCFBlTGi_EIpS2mQpgQLobtV0ITn2BJCakOgAZ4
It is required that the SUT and the Power Analyzer remain unchanged from the time of the submission until the results are published. This is to ensure that in case there are audits, it is easiest to verify and reproduce the results from an existing setup.
Added the system timestamp indicator at the start of the performance phase and a timestamp where the performance phase ends. This was done in v0.5 version of the Inference submissions and remains a valid change for the current submission.
If you are measuring the performance of a publicly available and widely-used system or framework, you must use publicly available and widely-used versions of the system or framework.
If you are measuring the performance of an experimental framework or system, you must make the system and framework you use available upon demand for replication.
There are multiple phases to a benchmark as listed in the MLPerf Inference Rules document. Power measured is evaluated only on the performance phase of the benchmark and not in any other phases. To determine this exact section, Loadgen has been instrumented to indicate the start and stop of the performance phase of the benchmark and all power measurements are evaluated within this phase from the power logging done as part of the benchmark.
The submission process has to use the software flow and scripts developed as part of the MLPerf benchmark Power measurement. The infrastructure has been developed by the MLPerf Power working group.
As part of the submissions and logging, all the logs generated by the MLPerf Power SW infrastructure need to be submitted. These include the power meter ranging logs and the power measurement logs that are generated during the performance runs.
Power and performance measurements should be from the same run for a given benchmark and scenario. The current script takes care of this by default and it cannot and should not be changed. Example: We cannot run the same benchmark and scenario 3 times and report the highest performance and lowest power among the 3 runs.
For version 1.0 and version 1.1 , we will only support Yokogawa power analyzers (aka meters).
The power analyzer settings will not be set manually, but through the software that is part of the MLPerf Power measurement infrastructure.
For v1.0 and v1.1 , the software supports a single meter connected to a node through single or multiple channel or configured in 3-phase mode. Multiple meter connectivity to a single node (SUT) is not supported in this version.
The goal of the testing is to mimic real-world usage scenarios as much as possible and enable showing the benefits of realistic power management, therefore we require:
-
Any power management system be qualified for use appropriate for the submission type (e.g., a generally available system must use software/firmware qualified for general availability and shipping with the platform)
-
No benchmark- or benchmarking-specific hacks
-
Any changes in power management behavior must not have manual intervention or have awareness of the benchmark.
Power logs will need to be submitted. All logs created as part of Power measurement will need to be submitted including the power analyzer ranging and the performance measurement.
Power Logs are generated by the software running on the Director.
The flow for power uses the same Loadgen as used for the performance runs. No additions are being made. Power flow uses the start and stop timestamp given by the loadgen for synchronizing the performance section of the benchmark and uses these markers for anchoring the window in which power is measured.
Q: Is MLPerf Power measurement accessible to anyone, or is it for member organizations only?
A: The MLPerf Power measurement tools include some proprietary software that is only available to members. Therefore, your organization must be a member of MLCommons, and additionally your organization must sign a EULA.
Q: Am I required to use the MLPerf Power automation tools?
A: Yes, you must use the automation tools for any results submitted to MLPerf. The MLPerf Power automation flow enables in itself a number of checks and balances that ensures the highest quality power measurement possible are being incorporated.
Q: How can I obtain the MLPerf Power automation tools?
A: To access the MLPerf Power automation tools, your company’s representative must sign thehttps://drive.google.com/file/d/1u9MdO4v5-uvbaJoElQoAwGb5_suMTZyH/view[MLPerf Power EULA], and send it to [email protected]. The MLCommons staff will give you access to a GitHub repo containing the automation tools.