Skip to content

Commit

Permalink
Merge pull request bytedance#57 from bytedance/HanTengfei99-patch-1
Browse files Browse the repository at this point in the history
(docs) refine micro perf readme
  • Loading branch information
YJessicaGao authored Mar 26, 2024
2 parents 66bf37e + 0517056 commit 6adaf10
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 23 deletions.
32 changes: 11 additions & 21 deletions byte_micro_perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,25 @@ Please follow the given style at `ByteMLPerf/vendor_zoo` directory to create a n
### An example

```
python3 launch.py --task softmax --hardware_type GPU
python3 launch.py --task exp --hardware_type GPU
```
#### Usage
```
--task: operator name please create a workload file for new operators by following the existing style in byte_micro_perf/workloads.
--hardware_type: hardware category name please derive a Backend class for your heterogeneous hardware in byte_micro_perf/backends.
--vendor_path: hardware config path(optional) it conrresponding to hardware configuration file in ByteMLPerf/vendor_zoo if provided.
```

### Expected Output
For different types of operators (Compute-bound / Memory-bound), we adopt various metrics to comprehensively evaluate the performance of the operator. Regarding the various metrics, the explanations are as follows:
| Metric | Description |
| -------- | ------- |
| Memory Size(MB) | the rough sum of read/write bytes |
| Kernel bandwidth(GB/s) | the achieved bandwidth under given input size of this kernel |
| Bandwidth Utilization(%) | the ratio of achieved bandwidth and theoretical bandwidth |
| Avg latency(us) |the average of kernel latencies|

Example:
```
{
"Operator": "EXP",
Expand All @@ -44,30 +51,13 @@ python3 launch.py --task softmax --hardware_type GPU
{
"Dtype": "float32",
"Memory Size(MB)": 4.0,
"Algo bandwidth(GB/s)": 271.83,
"Kernel bandwidth(GB/s)": 271.83,
"Bandwidth Utilization(%)": 0.17,
"Avg latency(us)": 15.43
}
]
}
{
"Operator": "ALLTOALL",
"Backend": "GPU",
"Host Info": "Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz",
"Device Info": "A100-PCIE-40GB",
"Performance": [
{
"Dtype": "float32",
"Memory Size(MB)": 0.06,
"Group": 4,
"Algo bandwidth(GB/s)": 1.54,
"Bus bandwidth(GB/s)": 1.15,
"Bandwidth Utilization(%)": 0.0,
"Avg latency(us)": 42.58
}
]
}
```

## Trouble Shooting
Expand Down
4 changes: 2 additions & 2 deletions byte_micro_perf/backends/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def dump_communication_ops_report(
"Dtype": dtype,
"Memory Size(MB)": round(mb, 2),
"Group": group_size,
"Algo bandwidth(GB/s)": round(algo_bw, 2),
"Kernel bandwidth(GB/s)": round(algo_bw, 2),
"Bus bandwidth(GB/s)": round(bus_bw, 2),
"Bandwidth Utilization(%)": bandwidth_utils,
"Avg latency(us)": round(latency, 2),
Expand Down Expand Up @@ -97,7 +97,7 @@ def dump_computation_ops_report(
report = {
"Dtype": dtype,
"Memory Size(MB)": round(mb, 2),
"Algo bandwidth(GB/s)": round(algo_bw, 2),
"Kernel bandwidth(GB/s)": round(algo_bw, 2),
"Bandwidth Utilization(%)": bandwidth_utils,
"Avg latency(us)": round(latency, 2),
}
Expand Down

0 comments on commit 6adaf10

Please sign in to comment.