Skip to content

Commit

Permalink
Documentation enhancements and removing watermark
Browse files Browse the repository at this point in the history
Signed-off-by: colramos-amd <[email protected]>
  • Loading branch information
coleramos425 committed Mar 27, 2024
1 parent 8b26745 commit 42f5fa7
Show file tree
Hide file tree
Showing 7 changed files with 165 additions and 179 deletions.
189 changes: 99 additions & 90 deletions src/docs-2.x/analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf

See sections below for more information on each.

```{note}
Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
```

## CLI Analysis
> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.

### Features

Expand All @@ -25,94 +28,6 @@ Run `omniperf analyze -h` for more details.

### Demo

- Single run
```shell
$ omniperf analyze -p workloads/vcopy/MI200/
```

- List top kernels and dispatches
```shell
$ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
```

- List metrics

```shell
$ omniperf analyze -p workloads/vcopy/MI200/ --list-metrics gfx90a
```

- Customized profiling "System Speed-of-Light" and "CS_Busy" only

```shell
$ omniperf analyze -p workloads/vcopy/MI200/ -b 2 5.1.0
```

> Note: Users can filter single metric or the whole hardware component by its id. In this case, 1 is the id for "system speed of light" and 5.1.0 the id for metric "GPU Busy Cycles".

- Filter kernels

First, list the top kernels in your application using `--list-stats`.
```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
Analysis mode = cli
[analysis] deriving Omniperf metrics...
--------------------------------------------------------------------------------
Detected Kernels (sorted descending by duration)
╒════╤══════════════════════════════════════════════╕
│ │ Kernel_Name │
╞════╪══════════════════════════════════════════════╡
│ 0 │ vecCopy(double*, double*, double*, int, int) │
╘════╧══════════════════════════════════════════════╛
--------------------------------------------------------------------------------
Dispatch list
╒════╤═══════════════╤══════════════════════════════════════════════╤══════════╕
│ │ Dispatch_ID │ Kernel_Name │ GPU_ID │
╞════╪═══════════════╪══════════════════════════════════════════════╪══════════╡
│ 0 │ 0 │ vecCopy(double*, double*, double*, int, int) │ 0 │
╘════╧═══════════════╧══════════════════════════════════════════════╧══════════╛
```

Second, select the index of the kernel you would like to filter (i.e. __vecCopy(double*, double*, double*, int, int) [clone .kd]__ at index __0__). Then, use this index to apply the filter via `-k/--kernels`.

```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ -k 0
Analysis mode = cli
[analysis] deriving Omniperf metrics...
--------------------------------------------------------------------------------
0. Top Stats
0.1 Top Kernels
╒════╤══════════════════════════════════════════╤═════════╤═══════════╤════════════╤══════════════╤════════╤═════╕
│ │ Kernel_Name │ Count │ Sum(ns) │ Mean(ns) │ Median(ns) │ Pct │ S │
╞════╪══════════════════════════════════════════╪═════════╪═══════════╪════════════╪══════════════╪════════╪═════╡
│ 0 │ vecCopy(double*, double*, double*, int, │ 1.00 │ 18560.00 │ 18560.00 │ 18560.00 │ 100.00 │ *
│ │ int) │ │ │ │ │ │ │
╘════╧══════════════════════════════════════════╧═════════╧═══════════╧════════════╧══════════════╧════════╧═════╛
... ...
```

> Note: You will see your filtered kernel(s) indicated by an asterisk in the Top Stats table


- Baseline comparison

```shell
omniperf analyze -p workload1/path/ -p workload2/path/
```
> Note: You can also apply different filters to each workload.

OR
```shell
omniperf analyze -p workload1/path/ -k 0 -p workload2/path/ -k 1
```

### Recommended workflow

1) To begin, generate a high-level analysis report utilizing Omniperf's `-b` (a.k.a. `--block`) flag.
```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ -b 2
Expand Down Expand Up @@ -347,11 +262,105 @@ Analyze
│ 2.1.28 │ Instr Fetch Latency │ 21.729248046875 │ Cycles │ │ │
╘═════════╧═══════════════════════════╧═══════════════════════╧══════════════════╧════════════════════╧════════════════════════╛
```
> **Note:** Some cells may be blank indicating a missing/unavailable hardware counter or NULL value

```{note}
Some cells may be blank indicating a missing/unavailable hardware counter or NULL value
```

3. Optimize application, iterate, and re-profile to inspect performance changes.
4. Redo a comprehensive analysis with Omniperf CLI at any milestone or at the end.

### More options

- __Single run__
```shell
$ omniperf analyze -p workloads/vcopy/MI200/
```

- __List top kernels and dispatches__
```shell
$ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
```

- __List metrics__

```shell
$ omniperf analyze -p workloads/vcopy/MI200/ --list-metrics gfx90a
```

- __Show "System Speed-of-Light" and "CS_Busy" blocks only__

```shell
$ omniperf analyze -p workloads/vcopy/MI200/ -b 2 5.1.0
```

```{note}
Users can filter single metric or the whole hardware component by its id. In this case, 1 is the id for "system speed of light" and 5.1.0 the id for metric "GPU Busy Cycles".
```

- __Filter kernels__

First, list the top kernels in your application using `--list-stats`.
```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
Analysis mode = cli
[analysis] deriving Omniperf metrics...
--------------------------------------------------------------------------------
Detected Kernels (sorted descending by duration)
╒════╤══════════════════════════════════════════════╕
│ │ Kernel_Name │
╞════╪══════════════════════════════════════════════╡
│ 0 │ vecCopy(double*, double*, double*, int, int) │
╘════╧══════════════════════════════════════════════╛
--------------------------------------------------------------------------------
Dispatch list
╒════╤═══════════════╤══════════════════════════════════════════════╤══════════╕
│ │ Dispatch_ID │ Kernel_Name │ GPU_ID │
╞════╪═══════════════╪══════════════════════════════════════════════╪══════════╡
│ 0 │ 0 │ vecCopy(double*, double*, double*, int, int) │ 0 │
╘════╧═══════════════╧══════════════════════════════════════════════╧══════════╛
```

Second, select the index of the kernel you would like to filter (i.e. __vecCopy(double*, double*, double*, int, int) [clone .kd]__ at index __0__). Then, use this index to apply the filter via `-k/--kernels`.

```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ -k 0
Analysis mode = cli
[analysis] deriving Omniperf metrics...
--------------------------------------------------------------------------------
0. Top Stats
0.1 Top Kernels
╒════╤══════════════════════════════════════════╤═════════╤═══════════╤════════════╤══════════════╤════════╤═════╕
│ │ Kernel_Name │ Count │ Sum(ns) │ Mean(ns) │ Median(ns) │ Pct │ S │
╞════╪══════════════════════════════════════════╪═════════╪═══════════╪════════════╪══════════════╪════════╪═════╡
│ 0 │ vecCopy(double*, double*, double*, int, │ 1.00 │ 18560.00 │ 18560.00 │ 18560.00 │ 100.00 │ *
│ │ int) │ │ │ │ │ │ │
╘════╧══════════════════════════════════════════╧═════════╧═══════════╧════════════╧══════════════╧════════╧═════╛
... ...
```

```{note}
You will see your filtered kernel(s) indicated by an asterisk in the Top Stats table
```


- __Baseline comparison__

```shell
omniperf analyze -p workload1/path/ -p workload2/path/
```
OR
```shell
omniperf analyze -p workload1/path/ -k 0 -p workload2/path/ -k 1
```


## GUI Analysis

### Web-based GUI
Expand Down
25 changes: 13 additions & 12 deletions src/docs-2.x/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ def install(package):
# -- Project information -----------------------------------------------------

project = "Omniperf"
copyright = "2023-2024, Audacious Software Group"
author = "Audacious Software Group"
copyright = "2023-2024, Advanced Micro Devices, Inc. All Rights Reserved"
author = "AMD Research"

# The short X.Y version
version = repo_version
Expand Down Expand Up @@ -72,16 +72,16 @@ def install(package):
".md": "markdown",
}

sphinxmark_enable = True
sphinxmark_image = "text"
sphinxmark_text = "Release Candidate"
sphinxmark_text_size = 80
sphinxmark_div = "document"
sphinxmark_fixed = False
sphinxmark_text_rotation = 30
sphinxmark_text_color = (128, 128, 128)
sphinxmark_text_spacing = 800
sphinxmark_text_opacity = 30
# sphinxmark_enable = True
# sphinxmark_image = "text"
# sphinxmark_text = "Release Candidate"
# sphinxmark_text_size = 80
# sphinxmark_div = "document"
# sphinxmark_fixed = False
# sphinxmark_text_rotation = 30
# sphinxmark_text_color = (128, 128, 128)
# sphinxmark_text_spacing = 800
# sphinxmark_text_opacity = 30

from recommonmark.parser import CommonMarkParser

Expand Down Expand Up @@ -138,6 +138,7 @@ def install(package):
# Output file base name for HTML help builder.
htmlhelp_basename = "Omniperfdoc"

html_logo = 'images/amd-header-logo.svg'
html_theme_options = {
"analytics_id": "G-C5DYLCE9ED", # Provided by Google in your dashboard
"analytics_anonymize_ip": False,
Expand Down
6 changes: 5 additions & 1 deletion src/docs-2.x/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
```shell
$ omniperf profile -n vcopy_data -- ./vcopy -n 1048576 -b 256
```
The app runs, each kernel is launched, and profiling results are generated. By default, results are written to a subdirectory with your accelerator's name e.g., ./workloads/vcopy_data/MI200/ (where name is configurable via the `-n` argument). To collect all requested profile information, it may be required to replay kernels multiple times.
The app runs, each kernel is launched, and profiling results are generated. By default, results are written to a subdirectory with your accelerator's name e.g., ./workloads/vcopy_data/MI200/ (where name is configurable via the `-n` argument).
```{note}
To collect all requested profile information, it may be required to replay kernels multiple times.
```
2. **Customize data collection**
Expand Down
5 changes: 3 additions & 2 deletions src/docs-2.x/high_level_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally compose

![Omniperf Architectural Diagram](images/omniperf_server_vs_client_install.png)

> Note: To learn more about the client vs. server model of Omniperf and our install process please see the [Deployment section](./installation.md) of the docs.
```{note}
To learn more about the client vs. server model of Omniperf and our install process please see the [Deployment section](./installation.md) of the docs.
```
1 change: 1 addition & 0 deletions src/docs-2.x/images/amd-header-logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 15 additions & 7 deletions src/docs-2.x/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,11 @@ Omniperf client-side requires the following basic software dependencies prior to

In addition, Omniperf leverages a number of Python packages that are
documented in the top-level `requirements.txt` file. These must be
installed prior to Omniperf configuration.
installed prior to Omniperf configuration.

```{note}
If you're interested in building docs locally or running Omniperf's CI suite via PyTest, please see documented dependencies in `requirements-doc.txt` and `requirements-test.txt`, respectively.
```

The recommended procedure for Omniperf usage is to install into a shared file system so that multiple users can access the final installation. The following steps illustrate how to install the necessary python dependencies using [pip](https://packaging.python.org/en/latest/) and Omniperf into a shared location controlled by the `INSTALL_DIR` environment variable.

Expand Down Expand Up @@ -154,7 +158,9 @@ wishes to use instead.

## Server-side Setup

> Note: Server-side setup is not required to profile or analyze performance data from the CLI. It is provided as an additional mechanism to import performance data for examination within a detailed [Grafana](https://github.com/grafana/grafana) GUI.
```{note}
Server-side setup is not required to profile or analyze performance data from the CLI. It is provided as an additional mechanism to import performance data for examination within a detailed [Grafana](https://github.com/grafana/grafana) GUI.
```

Omniperf server-side requires the following basic software dependencies prior to usage:

Expand Down Expand Up @@ -191,10 +197,12 @@ We are now ready to build our Docker file. Navigate to your Omniperf install dir
$ sudo docker-compose build
$ sudo docker-compose up -d
```
> Note that TCP ports for Grafana (4000) and MongoDB (27017) in the docker container are mapped to 14000 and 27018, respectively, on the host side.
> TCP ports for Grafana (4000) and MongoDB (27017) in the docker container are mapped to 14000 and 27018, respectively, on the host side.
### Restart (Debug)
```{tip}
In the event that your Grafana or MongoDB instance crash fatally, you can always restart the server. Just navigate to your install directory and run:
```

```bash
$ sudo docker-compose down
$ sudo docker-compose up -d
Expand All @@ -216,9 +224,9 @@ The MongoDB Datasource must be configured prior to the first-time use. Navigate

Configure the following fields in the datasource settings:

- HTTP URL: set to *http://localhost:3333*
- MongoDB URL: set to *mongodb://temp:temp123@\<host-ip>:27018/admin?authSource=admin*
- Database Name: set to *admin*
- __HTTP URL__: set to `http://localhost:3333`
- __MongoDB URL__: set to `mongodb://temp:temp123@\<host-ip>:27018/admin?authSource=admin`
- __Database Name__: set to `admin`

After properly configuring these fields click **Save & Test** (as shown below) to make sure your connection is successful.

Expand Down
Loading

0 comments on commit 42f5fa7

Please sign in to comment.