Merge pull request bytedance#57 from bytedance/HanTengfei99-patch-1

(docs) refine micro perf readme
hliuca · Mar 26, 2024 · 6adaf10 · 6adaf10
2 parents 66bf37e + 0517056
commit 6adaf10
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 23 deletions.
diff --git a/byte_micro_perf/README.md b/byte_micro_perf/README.md
@@ -22,18 +22,25 @@ Please follow the given style at `ByteMLPerf/vendor_zoo` directory to create a n
 ### An example
 
 ```
-python3 launch.py --task softmax --hardware_type GPU
+python3 launch.py --task exp --hardware_type GPU
 ```
 #### Usage
 ```
 --task: operator name                              please create a workload file for new operators by following the existing style in byte_micro_perf/workloads.
 
 --hardware_type: hardware category name            please derive a Backend class for your heterogeneous hardware in byte_micro_perf/backends.
-
---vendor_path: hardware config path(optional)      it conrresponding to hardware configuration file in ByteMLPerf/vendor_zoo if provided.
 ```
 
 ### Expected Output
+For different types of operators (Compute-bound / Memory-bound), we adopt various metrics to comprehensively evaluate the performance of the operator. Regarding the various metrics, the explanations are as follows:
+| Metric    | Description |
+| -------- | ------- |
+| Memory Size(MB) | the rough sum of read/write bytes    |
+| Kernel bandwidth(GB/s) | the achieved bandwidth under given input size of this kernel     |
+| Bandwidth Utilization(%)    | the ratio of achieved bandwidth and theoretical bandwidth   |
+| Avg latency(us) |the average of kernel latencies|
+
+Example:
 ```
 {
     "Operator": "EXP",
@@ -44,30 +51,13 @@ python3 launch.py --task softmax --hardware_type GPU
         {
             "Dtype": "float32",
             "Memory Size(MB)": 4.0,
-            "Algo bandwidth(GB/s)": 271.83,
+            "Kernel bandwidth(GB/s)": 271.83,
             "Bandwidth Utilization(%)": 0.17,
             "Avg latency(us)": 15.43
         }
     ]
 }
 
-{
-    "Operator": "ALLTOALL",
-    "Backend": "GPU",
-    "Host Info": "Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz",
-    "Device Info": "A100-PCIE-40GB",
-    "Performance": [
-        {
-            "Dtype": "float32",
-            "Memory Size(MB)": 0.06,
-            "Group": 4,
-            "Algo bandwidth(GB/s)": 1.54,
-            "Bus bandwidth(GB/s)": 1.15,
-            "Bandwidth Utilization(%)": 0.0,
-            "Avg latency(us)": 42.58
-        }
-    ]
-}
 ```
 
 ## Trouble Shooting

diff --git a/byte_micro_perf/backends/utils.py b/byte_micro_perf/backends/utils.py
@@ -47,7 +47,7 @@ def dump_communication_ops_report(
         "Dtype": dtype,
         "Memory Size(MB)": round(mb, 2),
         "Group": group_size,
-        "Algo bandwidth(GB/s)": round(algo_bw, 2),
+        "Kernel bandwidth(GB/s)": round(algo_bw, 2),
         "Bus bandwidth(GB/s)": round(bus_bw, 2),
         "Bandwidth Utilization(%)": bandwidth_utils,
         "Avg latency(us)": round(latency, 2),
@@ -97,7 +97,7 @@ def dump_computation_ops_report(
     report = {
         "Dtype": dtype,
         "Memory Size(MB)": round(mb, 2),
-        "Algo bandwidth(GB/s)": round(algo_bw, 2),
+        "Kernel bandwidth(GB/s)": round(algo_bw, 2),
         "Bandwidth Utilization(%)": bandwidth_utils,
         "Avg latency(us)": round(latency, 2),
     }