Documentation enhancements and removing watermark

Signed-off-by: colramos-amd <[email protected]>
ROCm · Mar 27, 2024 · 42f5fa7 · 42f5fa7
1 parent 8b26745
commit 42f5fa7
Show file tree

Hide file tree

Showing 7 changed files with 165 additions and 179 deletions.
diff --git a/src/docs-2.x/analysis.md b/src/docs-2.x/analysis.md
@@ -11,8 +11,11 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf
 
 See sections below for more information on each.
 
+```{note}
+Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
+```
+
 ## CLI Analysis
-> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
 
 ### Features
 
@@ -25,94 +28,6 @@ Run `omniperf analyze -h` for more details.
 
 ### Demo
 
-- Single run
-  ```shell
-  $ omniperf analyze -p workloads/vcopy/MI200/
-  ```
-
-- List top kernels and dispatches
-  ```shell
-  $ omniperf analyze -p workloads/vcopy/MI200/  --list-stats
-  ```
-
-- List metrics
-
-  ```shell
-  $ omniperf analyze -p workloads/vcopy/MI200/  --list-metrics gfx90a
-  ```
-
-- Customized profiling "System Speed-of-Light" and "CS_Busy" only
-
-  ```shell
-  $ omniperf analyze -p workloads/vcopy/MI200/  -b 2  5.1.0
-  ```
-
-  > Note: Users can filter single metric or the whole hardware component by its id. In this case, 1 is the id for "system speed of light" and 5.1.0 the id for metric "GPU Busy Cycles".
-
-- Filter kernels
-
-  First, list the top kernels in your application using `--list-stats`.
-  ```shell-session
-  $ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
-  
-  Analysis mode = cli
-  [analysis] deriving Omniperf metrics...
-
-  --------------------------------------------------------------------------------
-  Detected Kernels (sorted descending by duration)
-  ╒════╤══════════════════════════════════════════════╕
-  │    │ Kernel_Name                                  │
-  ╞════╪══════════════════════════════════════════════╡
-  │  0 │ vecCopy(double*, double*, double*, int, int) │
-  ╘════╧══════════════════════════════════════════════╛
-
-  --------------------------------------------------------------------------------
-  Dispatch list
-  ╒════╤═══════════════╤══════════════════════════════════════════════╤══════════╕
-  │    │   Dispatch_ID │ Kernel_Name                                  │   GPU_ID │
-  ╞════╪═══════════════╪══════════════════════════════════════════════╪══════════╡
-  │  0 │             0 │ vecCopy(double*, double*, double*, int, int) │        0 │
-  ╘════╧═══════════════╧══════════════════════════════════════════════╧══════════╛
-
-  ```
-
-  Second, select the index of the kernel you would like to filter (i.e. __vecCopy(double*, double*, double*, int, int) [clone .kd]__ at index __0__). Then, use this index to apply the filter via `-k/--kernels`.
-
-  ```shell-session
-  $ omniperf analyze -p workloads/vcopy/MI200/ -k 0
-  
-  Analysis mode = cli
-  [analysis] deriving Omniperf metrics...
-
-  --------------------------------------------------------------------------------
-  0. Top Stats
-  0.1 Top Kernels
-  ╒════╤══════════════════════════════════════════╤═════════╤═══════════╤════════════╤══════════════╤════════╤═════╕
-  │    │ Kernel_Name                              │   Count │   Sum(ns) │   Mean(ns) │   Median(ns) │    Pct │ S   │
-  ╞════╪══════════════════════════════════════════╪═════════╪═══════════╪════════════╪══════════════╪════════╪═════╡
-  │  0 │ vecCopy(double*, double*, double*, int,  │    1.00 │  18560.00 │   18560.00 │     18560.00 │ 100.00 │ *   │
-  │    │ int)                                     │         │           │            │              │        │     │
-  ╘════╧══════════════════════════════════════════╧═════════╧═══════════╧════════════╧══════════════╧════════╧═════╛
-  ... ...
-  ```
-
-  > Note: You will see your filtered kernel(s) indicated by an asterisk in the Top Stats table
-
-
-- Baseline comparison
-
-  ```shell
-  omniperf analyze -p workload1/path/  -p workload2/path/
-  ```
-  > Note: You can also apply different filters to each workload.
-
-  OR
-  ```shell
-  omniperf analyze -p workload1/path/ -k 0  -p workload2/path/ -k 1
-  ```
-
-### Recommended workflow
-
 1) To begin, generate a high-level analysis report utilizing Omniperf's `-b` (a.k.a. `--block`) flag. 
 ```shell-session
 $ omniperf analyze -p workloads/vcopy/MI200/ -b 2
@@ -347,11 +262,105 @@ Analyze
 │ 2.1.28  │ Instr Fetch Latency       │ 21.729248046875       │ Cycles           │                    │                        │
 ╘═════════╧═══════════════════════════╧═══════════════════════╧══════════════════╧════════════════════╧════════════════════════╛
 ```
-> **Note:** Some cells may be blank indicating a missing/unavailable hardware counter or NULL value
+
+```{note}
+Some cells may be blank indicating a missing/unavailable hardware counter or NULL value
+```
 
 3. Optimize application, iterate, and re-profile to inspect performance changes.
 4. Redo a comprehensive analysis with Omniperf CLI at any milestone or at the end.
 
+### More options
+
+- __Single run__
+  ```shell
+  $ omniperf analyze -p workloads/vcopy/MI200/
+  ```
+
+- __List top kernels and dispatches__
+  ```shell
+  $ omniperf analyze -p workloads/vcopy/MI200/  --list-stats
+  ```
+
+- __List metrics__
+
+  ```shell
+  $ omniperf analyze -p workloads/vcopy/MI200/  --list-metrics gfx90a
+  ```
+
+- __Show "System Speed-of-Light" and "CS_Busy" blocks only__
+
+  ```shell
+  $ omniperf analyze -p workloads/vcopy/MI200/  -b 2  5.1.0
+  ```
+
+  ```{note}
+  Users can filter single metric or the whole hardware component by its id. In this case, 1 is the id for "system speed of light" and 5.1.0 the id for metric "GPU Busy Cycles".
+  ```
+
+- __Filter kernels__
+
+  First, list the top kernels in your application using `--list-stats`.
+  ```shell-session
+  $ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
+  
+  Analysis mode = cli
+  [analysis] deriving Omniperf metrics...
+
+  --------------------------------------------------------------------------------
+  Detected Kernels (sorted descending by duration)
+  ╒════╤══════════════════════════════════════════════╕
+  │    │ Kernel_Name                                  │
+  ╞════╪══════════════════════════════════════════════╡
+  │  0 │ vecCopy(double*, double*, double*, int, int) │
+  ╘════╧══════════════════════════════════════════════╛
+
+  --------------------------------------------------------------------------------
+  Dispatch list
+  ╒════╤═══════════════╤══════════════════════════════════════════════╤══════════╕
+  │    │   Dispatch_ID │ Kernel_Name                                  │   GPU_ID │
+  ╞════╪═══════════════╪══════════════════════════════════════════════╪══════════╡
+  │  0 │             0 │ vecCopy(double*, double*, double*, int, int) │        0 │
+  ╘════╧═══════════════╧══════════════════════════════════════════════╧══════════╛
+
+  ```
+
+  Second, select the index of the kernel you would like to filter (i.e. __vecCopy(double*, double*, double*, int, int) [clone .kd]__ at index __0__). Then, use this index to apply the filter via `-k/--kernels`.
+
+  ```shell-session
+  $ omniperf analyze -p workloads/vcopy/MI200/ -k 0
+  
+  Analysis mode = cli
+  [analysis] deriving Omniperf metrics...
+
+  --------------------------------------------------------------------------------
+  0. Top Stats
+  0.1 Top Kernels
+  ╒════╤══════════════════════════════════════════╤═════════╤═══════════╤════════════╤══════════════╤════════╤═════╕
+  │    │ Kernel_Name                              │   Count │   Sum(ns) │   Mean(ns) │   Median(ns) │    Pct │ S   │
+  ╞════╪══════════════════════════════════════════╪═════════╪═══════════╪════════════╪══════════════╪════════╪═════╡
+  │  0 │ vecCopy(double*, double*, double*, int,  │    1.00 │  18560.00 │   18560.00 │     18560.00 │ 100.00 │ *   │
+  │    │ int)                                     │         │           │            │              │        │     │
+  ╘════╧══════════════════════════════════════════╧═════════╧═══════════╧════════════╧══════════════╧════════╧═════╛
+  ... ...
+  ```
+
+  ```{note}
+  You will see your filtered kernel(s) indicated by an asterisk in the Top Stats table
+  ```
+
+
+- __Baseline comparison__
+
+  ```shell
+  omniperf analyze -p workload1/path/  -p workload2/path/
+  ```
+  OR
+  ```shell
+  omniperf analyze -p workload1/path/ -k 0  -p workload2/path/ -k 1
+  ```
+
+
 ## GUI Analysis
 
 ### Web-based GUI

diff --git a/src/docs-2.x/conf.py b/src/docs-2.x/conf.py
@@ -32,8 +32,8 @@ def install(package):
 # -- Project information -----------------------------------------------------
 
 project = "Omniperf"
-copyright = "2023-2024, Audacious Software Group"
-author = "Audacious Software Group"
+copyright = "2023-2024, Advanced Micro Devices, Inc. All Rights Reserved"
+author = "AMD Research"
 
 # The short X.Y version
 version = repo_version
@@ -72,16 +72,16 @@ def install(package):
     ".md": "markdown",
 }
 
-sphinxmark_enable = True
-sphinxmark_image = "text"
-sphinxmark_text = "Release Candidate"
-sphinxmark_text_size = 80
-sphinxmark_div = "document"
-sphinxmark_fixed = False
-sphinxmark_text_rotation = 30
-sphinxmark_text_color = (128, 128, 128)
-sphinxmark_text_spacing = 800
-sphinxmark_text_opacity = 30
+# sphinxmark_enable = True
+# sphinxmark_image = "text"
+# sphinxmark_text = "Release Candidate"
+# sphinxmark_text_size = 80
+# sphinxmark_div = "document"
+# sphinxmark_fixed = False
+# sphinxmark_text_rotation = 30
+# sphinxmark_text_color = (128, 128, 128)
+# sphinxmark_text_spacing = 800
+# sphinxmark_text_opacity = 30
 
 from recommonmark.parser import CommonMarkParser
 
@@ -138,6 +138,7 @@ def install(package):
 # Output file base name for HTML help builder.
 htmlhelp_basename = "Omniperfdoc"
 
+html_logo = 'images/amd-header-logo.svg'
 html_theme_options = {
     "analytics_id": "G-C5DYLCE9ED",  #  Provided by Google in your dashboard
     "analytics_anonymize_ip": False,

diff --git a/src/docs-2.x/getting_started.md b/src/docs-2.x/getting_started.md
@@ -16,7 +16,11 @@
     ```shell
     $ omniperf profile -n vcopy_data -- ./vcopy -n 1048576 -b 256
     ```
-    The app runs, each kernel is launched, and profiling results are generated. By default, results are written to a subdirectory with your accelerator's name e.g., ./workloads/vcopy_data/MI200/ (where name is configurable via the `-n` argument). To collect all requested profile information, it may be required to replay kernels multiple times.
+    The app runs, each kernel is launched, and profiling results are generated. By default, results are written to a subdirectory with your accelerator's name e.g., ./workloads/vcopy_data/MI200/ (where name is configurable via the `-n` argument).
+    
+    ```{note}
+    To collect all requested profile information, it may be required to replay kernels multiple times.
+    ```
 
 2. **Customize data collection**
 

diff --git a/src/docs-2.x/high_level_design.md b/src/docs-2.x/high_level_design.md
@@ -17,5 +17,6 @@ The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally compose
 
 ![Omniperf Architectural Diagram](images/omniperf_server_vs_client_install.png)
 
-> Note: To learn more about the client vs. server model of Omniperf and our install process please see the [Deployment section](./installation.md) of the docs.
-
+```{note}
+To learn more about the client vs. server model of Omniperf and our install process please see the [Deployment section](./installation.md) of the docs.
+```
diff --git a/src/docs-2.x/images/amd-header-logo.svg b/src/docs-2.x/images/amd-header-logo.svg
diff --git a/src/docs-2.x/installation.md b/src/docs-2.x/installation.md
@@ -33,7 +33,11 @@ Omniperf client-side requires the following basic software dependencies prior to
 
 In addition, Omniperf leverages a number of Python packages that are
 documented in the top-level `requirements.txt` file.  These must be
-installed prior to Omniperf configuration.  
+installed prior to Omniperf configuration.
+
+```{note}
+If you're interested in building docs locally or running Omniperf's CI suite via PyTest, please see documented dependencies in `requirements-doc.txt` and `requirements-test.txt`, respectively.
+```
 
 The recommended procedure for Omniperf usage is to install into a shared file system so that multiple users can access the final installation.  The following steps illustrate how to install the necessary python dependencies using [pip](https://packaging.python.org/en/latest/) and Omniperf into a shared location controlled by the `INSTALL_DIR` environment variable.
 
@@ -154,7 +158,9 @@ wishes to use instead.
 
 ## Server-side Setup
 
-> Note: Server-side setup is not required to profile or analyze performance data from the CLI. It is provided as an additional mechanism to import performance data for examination within a detailed [Grafana](https://github.com/grafana/grafana) GUI.
+```{note}
+Server-side setup is not required to profile or analyze performance data from the CLI. It is provided as an additional mechanism to import performance data for examination within a detailed [Grafana](https://github.com/grafana/grafana) GUI.
+```
 
 Omniperf server-side requires the following basic software dependencies prior to usage:
 
@@ -191,10 +197,12 @@ We are now ready to build our Docker file. Navigate to your Omniperf install dir
 $ sudo docker-compose build
 $ sudo docker-compose up -d
 ```
-> Note that TCP ports for Grafana (4000) and MongoDB (27017) in the docker container are mapped to 14000 and 27018, respectively, on the host side.
+> TCP ports for Grafana (4000) and MongoDB (27017) in the docker container are mapped to 14000 and 27018, respectively, on the host side.
 
-### Restart (Debug)
+```{tip}
 In the event that your Grafana or MongoDB instance crash fatally, you can always restart the server. Just navigate to your install directory and run:
+```
+
 ```bash
 $ sudo docker-compose down
 $ sudo docker-compose up -d
@@ -216,9 +224,9 @@ The MongoDB Datasource must be configured prior to the first-time use. Navigate
 
 Configure the following fields in the datasource settings:
 
-- HTTP URL: set to *http://localhost:3333*
-- MongoDB URL: set to *mongodb://temp:temp123@\<host-ip>:27018/admin?authSource=admin*
-- Database Name: set to *admin*
+- __HTTP URL__: set to `http://localhost:3333`
+- __MongoDB URL__: set to `mongodb://temp:temp123@\<host-ip>:27018/admin?authSource=admin`
+- __Database Name__: set to `admin`
 
 After properly configuring these fields click **Save & Test** (as shown below) to make sure your connection is successful.