Merge branch 'idevelop' into develop

Rmalavally · Apr 19, 2024 · a5c7a9d · a5c7a9d
2 parents 3ac6f3b + dc3124c
commit a5c7a9d
Show file tree

Hide file tree

Showing 10 changed files with 1,758 additions and 731 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/tools/autotag/tag_script.py b/tools/autotag/tag_script.py
@@ -84,11 +84,9 @@ def exclude(self) -> List[str]:
             "MIOpenGEMM",
             "MIOpenKernels",
             "MIOpenTensile",
-            "ROCmValidationSuite",
+            "MLSEQA_TestRepo",
             "half",
-            "hipFORT",
             "rccl-rdma-sharp-plugins",
-            "MLSEQA_TestRepo",
         ]
         return defaults + (self._exclude if self._exclude is not None else [])
 
@@ -237,10 +235,15 @@ def run_tagging():
     # Find all the math libraries and their remotes.
     included_names = [
         "AMDMIGraphX",
+        "HIPIFY", #
         "MIOpen",
+        "MIVisionX",
+        "ROCmValidationSuite", #
+        "composable_kernel",
+        "hipfort",
         "rocDecode",
         "rocm-cmake",
-        "rocprofiler"
+        "rpp",
     ]
     included_groups = [
         "mathlibs"

diff --git a/tools/autotag/templates/rocm_changes/5.7.1.md b/tools/autotag/templates/rocm_changes/5.7.1.md
@@ -41,7 +41,7 @@ kernels found by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_
 points to the stored file.
 
 For more details, refer to the
-[rocBLAS Programmer's Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune).
+[rocBLAS Programmer's Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/Programmers_Guide.html).
 
 #### HIP 5.7.1 (for ROCm 5.7.1)
 

diff --git a/tools/autotag/templates/rocm_changes/6.1.0.md b/tools/autotag/templates/rocm_changes/6.1.0.md
@@ -0,0 +1,319 @@
+
+The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
+performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:
+
+* Full support for Ubuntu 22.04.4.
+
+* **rocDecode**, a new ROCm component that provides high-performance video decode support for
+  AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
+  YUV frames in video memory. With decoded frames in video memory, you can run video
+  post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.
+
+  To learn more, refer to the rocDecode 
+  [documentation](https://rocm.docs.amd.com/projects/rocDecode/en/latest/).
+
+### OS and GPU support changes
+
+ROCm 6.1 adds the following operating system support:
+
+* MI300A: Ubuntu 22.04.4 and RHEL 9.3
+* MI300X: Ubuntu 22.04.4
+
+Future releases will add additional operating systems to match our general offering. For older
+generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.
+
+```{tip}
+To view the complete list of supported GPUs and operating systems, refer to the system requirements
+page for
+[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
+and
+[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).
+```
+
+### Installation packages
+
+This release includes a new set of packages for every module (all libraries and binaries default to
+`DT_RPATH`). Package names have the suffix `rpath`; for example, the `rpath` variant of `rocminfo` is
+`rocminfo-rpath`.
+
+```{warning}
+The new `rpath` packages will conflict with the default packages; they are meant to be used only in
+environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
+do **not** recommend trying to install both sets of packages.
+```
+
+#### AMD SMI
+
+AMD SMI for ROCm 6.1.0
+
+##### Additions
+
+* **Added Monitor command**. This provides users the ability to customize GPU metrics to capture,
+  collect, and observe. Output is provided in a table view. This aligns closer to ROCm SMI `rocm-smi`
+  (no argument), and allows you to customize per the data that are helpful for your use-case.
+
+* **Integrated ESMI Tool**. You can get CPU metrics and telemetry through our API and CLI tools.
+  You can get this information using the `amd-smi static` and `amd-smi metric` commands. This is only
+  available for limited target processors. As of ROCm 6.0.2, this is listed as:
+  * AMD Zen3 based CPU Family 19h Models 0h-Fh and 30h-3Fh
+  * AMD Zen4 based CPU Family 19h Models 10h-1Fh and A0-AFh
+
+* **Added support for new metrics: VCN, JPEG engines, and PCIe errors**. Using the AMD SMIrccl
+  tool, you can retrieve VCN, JPEG engines, and PCIe errors by calling `amd-smi metric -P` or
+  `amd-smi metric --usage`. Depending on device support, `VCN_ACTIVITY` will update for MI3x ASICs
+  (with 4 separate VCN engine activities) for older ASICs `MM_ACTIVITY` with UVD/VCN engine activity
+  (average of all engines). `JPEG_ACTIVITY` is a new field for MI3x ASICs, where device can support up
+  to 32 JPEG engine activities. See our documentation for more in-depth understanding of these new
+  fields.
+
+* **Added AMDSMI Tool version**. AMD SMI will report *three versions*: AMDSMI Tool, AMDSMI
+  Library version, and ROCm version.
+
+  The AMDSMI Tool version is the CLI/tool version number with commit ID appended after the `+` sign.
+  The AMDSMI Library version is the library package version number. The ROCm version is the system's
+  installed ROCm version; if ROCm is not installed, it reports N/A.
+
+* **Added XGMI table**. Displays XGMI information for AMD GPU devices in a table format. This is
+  only available on supported ASICs (e.g., MI300). Here, users can view read/write data XGMI or PCIe
+  accumulated data transfer size (in KiloBytes).
+
+* **Added units of measure to JSON output.**. We added unit of measure to JSON/CSV
+  `amd-smi metric`, `amd-smi static`, and `amd-smi monitor` commands.
+
+##### Changes
+
+* **Topology is now left-aligned with BDF for each device listed individual table's row/columns**.
+  We provided each device's BDF for every table's row/columns, then left-aligned data. We want AMD
+  SMI Tool output to be easy to understand and digest. Having to scroll up to find this information
+  made it difficult to follow, especially for devices that have many devices associated with one ASIC.
+
+##### Fixes
+
+* **Fix for RDNA3/RDNA2/MI100 'amdsmi_get_gpu_pci_bandwidth()' in 'frequencies_read' tests**.
+  For devices that do not report (e.g., RDNA3/RDNA2/MI100), we have added checks to confirm that
+  these devices return `AMDSMI_STATUS_NOT_SUPPORTED`. Otherwise, tests now display a return
+  string.
+
+* **Fix for devices that have an older PyYAML installed**. For platforms that are identified as having
+  an older PyYAML version or pip, we now manually update both pip and PyYAML as needed. This
+  fix impacts the following CLI commands:
+  * `amd-smi list`
+  * `amd-smi static`
+  * `amd-smi firmware`
+  * `amd-smi metric`
+  * `amd-smi topology`
+
+* **Fix for crash when user is not a member of video/render groups**. AMD SMI now uses the
+  same mutex handler for devices as ROCm SMI. This helps avoid crashes when DRM/device data are
+  inaccessible to the logged-in user.
+
+##### Known issues
+
+* There is an `AttributeError` while running `amd-smi process --csv`
+* GPU reset results in an "*Unable to reset non-amd GPU*" error
+* bad pages results with "ValueError: NULL pointer access"
+* Some RDNA3 cards may enumerate to `Slot type = UNKNOWN`
+
+#### HIP
+
+HIP 6.1 for ROCm 6.1
+
+##### Additions
+
+* New environment variable, `HIP_LAUNCH_BLOCKING`, which is used for serialization on kernel
+  execution.
+* The default value is 0 (disable): kernel runs normally, as defined in the queue
+* When set as 1 (enable): HIP runtime serializes the kernel enqueue and behaves the same as
+  `AMD_SERIALIZE_KERNEL`
+* Added HIPRTC support for hip headers `driver_types`, `math_functions`, `library_types`,
+  `math_functions`, `hip_math_constants`, `channel_descriptor`, `device_functions`, `hip_complex`,
+  `surface_types`, `texture_types`
+
+##### Changes
+
+* HIPRTC now assumes WGP mode for gfx10+. You can enable CU mode by passing `-mcumode` to the
+  compile options from `hiprtcCompileProgram`.
+
+##### Fixes
+
+* HIP complex vector type multiplication and division operations.
+  On an AMD platform, some duplicated complex operators are removed to avoid compilation failures.
+  In HIP, `hipFloatComplex` and `hipDoubleComplex` are defined as complex datatypes:
+  * `typedef float2 hipFloatComplex`
+  * `typedef double2 hipDoubleComplex`
+
+  Any application that uses complex multiplication and division operations must replace `*` and `/`
+  operators with the following:
+  * `hipCmulf() and hipCdivf() for hipFloatComplex`
+  * `hipCmul() and hipCdiv() for hipDoubleComplex`
+
+  Note that these complex operations are equivalent to corresponding types/functions on an NVIDIA
+  platform.
+
+#### HIPIFY
+
+HIPIFY for ROCm 6.1.0
+
+##### Additions
+
+* CUDA 12.3.2 support
+* cuDNN 8.9.7 support
+* LLVM 17.0.6 support
+* Full `hipSOLVER` support
+* Full `rocSPARSE` support
+* New option: `--amap`, which will hipify as much as possible, ignoring `--default-preprocessor`
+  behavior
+
+##### Fixes
+
+* Code blocks skipped by the preprocessor are no longer hipified under the `--default-preprocessor`
+  option
+
+#### ROCm Compiler
+
+ROCm Compiler for ROCm 6.1.0
+
+##### Additions
+
+* Compiler now generates `.uniform_work_group_size` and records it in the metadata. It indicates if the
+  kernel requires that each dimension of global size is a multiple of the corresponding dimension of
+  work-group size. A value of 1 is true, and 0 is false. This metadata is only provided when the value is
+  1.
+* Added the `rocm-llvm-docs` package.
+* Added ROCm Device-Libs, ROCm Compiler Support, and hipCC within the `llvm-project/amd`
+  subdirectory to AMD’s fork of the LLVM project.
+* Added support for C++ Parallel Algorithm Offload via HIP (HIPSTDPAR), which allows parallel
+  algorithms to run on the GPU.
+
+##### Changes
+
+* `rocm-clang-ocl` is now an optional package and will require manual installation.
+
+##### Deprecations
+
+* hipCC adds `-mllvm`, `-amdgpu-early-inline-all=true`, and `-mllvm` `-amdgpu-function-calls=false` by
+  default to compiler invocations. These flags will be removed from hipCC in a future ROCm release.
+
+##### Fixes
+
+AddressSanitizer (ASan):
+* Added `sanitized_padded_global` LLVM ir attribute to identify sanitizer instrumented globals.
+* For ASan instrumented global, emit two symbols: one with actual size and the other with
+  instrumented size.
+
+  [On GitHub](https://github.com/ROCm/ROCm/issues/2551)
+
+##### Known issues
+
+* Due to an issue within the `amd-llvm` compiler shipping with ROCm 6.1, HIPSTDPAR's interposition mode, which is enabled by `--hipstdpar-interpose-alloc` is currently broken.
+
+The temporary workaround is to use the upstream LLVM 18 (or newer) compiler. This issue will be addressed in a future ROCm release ."
+
+#### ROCm Data Center (RDC)
+
+RDC for ROCm 6.1.0
+
+##### Changes
+
+* Added `--address` flag to rdcd
+* Upgraded from C++11 to C++17
+* Upgraded gRPC
+
+#### ROCDebugger (ROCgdb)
+
+ROCgdb for ROCm 6.1.0
+
+##### Fixes
+
+Previously, ROCDebugger encountered hangs and crashes when stepping over the `s_endpgm`
+instruction at the end of a HIP kernel entry function, which caused the stepped wave to exit. This issue
+is fixed in the ROCm 6.1 release. You can now step over the last instruction of any HIP kernel without
+debugger hangs or crashes.
+
+#### ROCm SMI
+
+ROCm SMI for ROCm 6.1.0
+
+##### Additions
+
+* **Added support to set max/min clock level for sclk ('RSMI_CLK_TYPE_SYS') or mclk ('RSMI_CLK_TYPE_MEM')**.
+  You can now set a maximum or minimum `sclk` or `mclk` value through the
+  `rsmi_dev_clk_extremum_set()` API provided ASIC support. Alternatively, you can use our Python CLI
+  tool (`rocm-smi --setextremum max sclk 1500`).
+
+* **Added `rsmi_dev_target_graphics_version_get()`**. You can now query through ROCm SMI API
+  (`rsmi_dev_target_graphics_version_get()`) to retreive the target graphics version for a GPU device.
+  Currently, this output is not supplied through our ROCm SMI CLI.
+
+##### Changes
+
+* **Removed non-unified API headers: Individual GPU metric APIs are no longer supported**.
+  The individual metric APIs (`rsmi_dev_metrics_*`) were removed in order to keep updates easier for
+  new GPU metric support. By providing a simple API (`rsmi_dev_gpu_metrics_info_get()`) with its
+  reported device metrics, it is worth noting there is a risk for ABI break-age using
+  `rsmi_dev_gpu_metrics_info_get()`. It is vital to understand that ABI breaks are necessary (in some
+  cases) in order to support newer ASICs and metrics for our customers. We will continue to support
+  `rsmi_dev_gpu_metrics_info_get()` with these considerations and limitations in mind.
+
+* **Deprecated 'rsmi_dev_power_ave_get()'; use the newer API, 'rsmi_dev_power_get()'**. As
+  outlined in the change for 6.0.0 (*Added a generic power API: rsmi_dev_power_get*), is now
+  deprecated. You must update your ROCm SMI API calls accordingly.
+
+##### Fixes
+
+* Fixed `--showpids` reporting `[PID] [PROCESS NAME] 1 UNKNOWN UNKNOWN UNKNOWN`.
+  Output was failing because `cu_occupancy debugfs` method is not provided on some graphics cards
+  by design. `get_compute_process_info_by_pid` was updated to reflect this and returns with the output
+  needed by the CLI.
+
+* Fixed `rocm-smi --showpower` output, which was inconsistent on some RDNA3 devices.
+  We updated this to use `rsmi_dev_power_get()` within the CLI to provide a consistent device power
+  output. This was caused by using the now-deprecated `rsmi_dev_average_power_get()` API.
+
+* Fixed `rocm-smi --setcomputepartition` and `rocm-smi --resetcomputepartition` to notate if device is
+  `EBUSY`
+
+* Fixed `rocm-smi --setmemorypartition` and `rocm-smi --resetmemorypartition` read only SYSFS to
+  return `RSMI_STATUS_NOT_SUPPORTED`
+  The  `rsmi_dev_memory_partition_set` API is updated to handle the read-only SYSFS check.
+  Corresponding tests and CLI (`rocm-smi --setmemorypartition` and
+  `rocm-smi --resetmemorypartition`) calls were updated accordingly.
+
+* Fixed `rocm-smi --showclkvolt` and `rocm-smi --showvc`, which were displaying 0 for overdrive and
+  that the voltage curve is not supported.
+
+#### ROCProfiler
+
+ROCProfiler for ROCm 6.1.0
+
+##### Fixes
+
+* Fixed ROCprofiler to match versioning changes in HIP Runtime
+* Fixed plugins race condition
+* Updated metrics to MI300
+
+#### ROCm Validation Suite
+
+##### Known issue
+
+* In a future release, the ROCm Validation Suite P2P Benchmark and Qualification Tool (PBQT) tests will be optimized to meet the target bandwidth requirements for MI300X.
+
+  [On GitHub](https://github.com/ROCm/ROCm/issues/3027)
+
+#### MI200 SR-IOV 
+
+##### Known issue
+
+* Multimedia applications may encounter compilation errors in the MI200 Single Root Input/Output Virtualization (SR-IOV) environment. This is because MI200 SR-IOV does not currently support multimedia applications. 
+
+  [On GitHub](https://github.com/ROCm/ROCm/issues/3028)
+
+### AMD MI300A RAS
+
+#### Fixed defect
+
+##### GFX correctable and uncorrectable error inject failures
+
+* Previously, the AMD CPU Reliability, Availability, and Serviceability (RAS) installation encountered correctable and uncorrectable failures while injecting an error.
+
+  This issue is resolved in the ROCm 6.1 release, and users will no longer encounter the GFX correctable error (CE) and uncorrectable error (UE) failures.
diff --git a/tools/autotag/util/__init__.py b/tools/autotag/util/__init__.py
@@ -1,2 +1,2 @@
 from .defaults import TEMPLATES, PROCESSORS
-from . import mivisionx
+from .custom_templates import hipfort, mivisionx, rpp, rvs
diff --git a/tools/autotag/util/custom_templates/__init__.py b/tools/autotag/util/custom_templates/__init__.py