Full documentation for MIGraphX is available at https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/.
- Added support for ONNX Runtime MIGraphX EP on Windows
- Added FP8 Python API
- Added examples for SD 2.1 and SDXL
- Improved Dynamic Batch to support BERT
- Added a
--test
flag in migraphx-driver to validate the installation - Added support for ONNX Operator: Einsum
- Added uint8 support in ONNX Operators
- Added fusion for group convolutions
- Added rocMLIR conv3d support
- Added rocgdb to the Dockerfile
- Improved ONNX Model Zoo coverage
- Reorganized memcpys with ONNX Runtime to improve performance
- Replaced scaler multibroadcast + unsqueeze with just a multibroadcast
- Improved MLIR kernel selection for multibroadcasted GEMMs
- Improved details of the perf report
- Enable mlir by default for GEMMs with small K
- Allow specifying dot or convolution fusion for mlir with environmental flag
- Improve performance on small reductions by doing multiple reduction per wavefront
- Add additional algebraic simplifications for mul-add-dot sequence of operations involving constants
- Use MLIR attention kernels in more cases
- Enables MIOpen and CK fusions for MI300 gfx arches
- Support for QDQ quantization patterns from Brevitas which have explicit cast/convert nodes before and after QDQ pairs
- Added Fusion of "contiguous + pointwise" and "layout + pointwise" operations which may result in performance gains in certain cases
- Added Fusion for "pointwise + layout" and "pointwise + contiguous" operations which may result in performance gains when using NHWC layout
- Added Fusion for "Pointwise + concat" operation which may help in performance in certain cases
- Fixes a bug in "concat + pointwise" fusion where output shape memory layout wasn't maintained
- Simplifies "slice + concat" pattern in SDXL UNet
- eliminates ZeroPoint/Shift in QuantizeLinear or DeQuantizeLinear ops if zero points values are zeros
- Improved inference performance by fusing Reduce to Broadcast
- Added additional information when printing the perf report
- Improve scalar fusions when not all strides are 0
- Added support for multi outputs in pointwise ops
- Improve reduction fusion with reshape operators
- Use the quantized output when an operator is used again
- Super Resolution model verification failed with FP16
- Suppressed confusing messages when compiling the model
- Mod operator failed to compile with int8 and int32 inputs
- Prevented spawning too many threads for constant propagation when parallel STL is not enabled
- Fixed a bug when running migraphx-driver with the --run 1 option
- Layernorm Accuracy fix: calculations in FP32
- Update Docker generator script to ROCm 6.1 to point at Jammy
- Floating Point exception fix for dim (-1) in reshape operator
- Fixed issue with int8 accuracy and models which were failing due to requiring a fourth bias input
- Fixed missing inputs not previously handled for quantized bias for the weights, and data values of the input matrix
- Fixed order of operations for int8 quantization which were causing inaccuracies and slowdowns
- Removed list initializer of prefix_scan_sum which was causing issues during compilation and resulting in the incorrect constructor to be used at compile
- Fixed the MIGRAPHX_GPU_COMPILE_PARALLEL flag to enable users to control number of threads used for parallel compilation
- Changed default location of libraries with release specific ABI changes
- Reorganized documentation in GitHub
- Removed the
--model
flag with migraphx-driver
- Added beta version of FP8, functional, not performant
- Created a dockerfile with MIGraphX+ONNX Runtime EP+Torch
- Added support for the
Hardmax
,DynamicQuantizeLinear
,Qlinearconcat
,Unique
,QLinearAveragePool
,QLinearSigmoid
,QLinearLeakyRelu
,QLinearMul
,IsInf
operators - Created web site examples for
Whisper
,Llama-2
, andStable Diffusion 2.1
- Created examples of using the ONNX Runtime MIGraphX Execution Provider with the
InceptionV3
andResnet50
models - Updated operators to support ONNX Opset 19
- Enable fuse_pointwise and fuse_reduce in the driver
- Add support for dot-(mul)-softmax-dot offloads to MLIR
- Added Blas auto-tuning for GEMMs
- Added dynamic shape support for the multinomial operator
- Added fp16 to accuracy checker
- Added initial code for running on Windows OS
- Improved the output of migraphx-driver command
- Documentation now shows all environment variables
- Updates needed for general stride support
- Enabled Asymmetric Quantization
- Added ScatterND unsupported reduction modes
- Rewrote softmax for better performance
- General improvement to how quantization is performed to support INT8
- Used problem_cache for gemm tuning
- Improved performance by always using rocMLIR for quantized convolution
- Improved group convolutions by using rocMLIR
- Improved accuracy of fp16 models
- ScatterElements unsupported reduction
- Added concat fusions
- Improved INT8 support to include UINT8
- Allow reshape ops between dq and quant_op
- Improve dpp reductions on navi
- Have the accuracy checker print the whole final buffer
- Added support for handling dynamic Slice and ConstantOfShape ONNX operators
- Add support for the dilations attribute to Pooling ops
- Add layout attribute support for LSTM operator
- Improved performance by removing contiguous for reshapes
- Handle all slice input variations
- Add scales attribute parse in upsample for older opset versions
- Added support for uneven Split operations
- Improved unit testing to run in python virtual environments
- Fixed outstanding issues in autogenerated documentation
- Update model zoo paths for examples
- Fixed promote_literals_test by using additional if condition
- Fixed export API symbols from dynamic library
- Fixed bug in pad operator from dimension reduction
- Fixed using the LD to embed files and enable by default when building shared libraries on linux
- fixed get_version()
- Fixed Round operator inaccuracy
- Fixed wrong size check when axes not present for slice
- Set the .SO version correctly
- Cleanup LSTM and RNN activation functions
- Placed gemm_pointwise at a higher priority than layernorm_pointwise
- Updated README to mention the need to include GPU_TARGETS when building MIGraphX
- Removed unused device kernels from Gather and Pad operators
- Removed int8x4 format
- Support for MI300 GPUs
- Support for TorchMIGraphX via PyTorch
- Boosted overall performance by integrating rocMLIR
- INT8 support for ONNX Runtime
- Support for ONNX version 1.14.1
- Added new operators:
Qlinearadd
,QlinearGlobalAveragePool
,Qlinearconv
,Shrink
,CastLike
, andRandomUniform
- Added an error message for when
gpu_targets
is not set during MIGraphX compilation - Added parameter to set tolerances with
migraphx-driver
verify - Added support for MXR files > 4 GB
- Added
MIGRAPHX_TRACE_MLIR
flag - BETA added capability for using ROCm Composable Kernels via the
MIGRAPHX_ENABLE_CK=1
environment variable
- Improved performance support for INT8
- Improved time precision while benchmarking candidate kernels from CK or MLIR
- Removed contiguous from reshape parsing
- Updated the
ConstantOfShape
operator to support Dynamic Batch - Simplified dynamic shapes-related operators to their static versions, where possible
- Improved debugging tools for accuracy issues
- Included a print warning about
miopen_fusion
while generatingmxr
- General reduction in system memory usage during model compilation
- Created additional fusion opportunities during model compilation
- Improved debugging for matchers
- Improved general debug messages
- Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
- Provided a compile option to improve the accuracy of some models by disabling Fast-Math
- Improved layernorm + pointwise fusion matching to ignore argument order
- Fixed accuracy issue with
ROIAlign
operator - Fixed computation logic for the
Trilu
operator - Fixed support for the DETR model
- Changed MIGraphX version to 2.8
- Extracted the test packages into a separate deb file when building MIGraphX from source
- Removed building Python 2.7 bindings
- hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a different directory than build time
- Added support for multi-target execution
- Added Dynamic Batch support with C++/Python APIs
- Added
migraphx.create_argument
to Python API - Added dockerfile example for Ubuntu 22.04
- Added TensorFlow supported ops in driver similar to exist onnx operator list
- Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
- Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
- You can now use the
fast_math
flag instead ofENV
for GELU - Print message from driver if offload copy is set for compiled program
- Optimized for ONNX Runtime 1.14.0
- Improved compile times by only building for the GPU on the system
- Improved performance of pointwise/reduction kernels when using NHWC layouts
- Loaded specific version of the
migraphx_py
library - Annotated functions with the block size so the compiler can do a better job of optimizing
- Enabled reshape on nonstandard shapes
- Used half HIP APIs to compute max and min
- Added support for broadcasted scalars to unsqueeze operator
- Improved multiplies with dot operator
- Handled broadcasts across dot and concat
- Added verify namespace for better symbol resolution
- Resolved accuracy issues with FP16 resnet50
- Updated cpp generator to handle inf from float
- Fixed assertion error during verify and made DCE work with tuples
- Fixed convert operation for NaNs
- Fixed shape typo in API test
- Fixed compile warnings for shadowing variable names
- Added missing specialization for the
nullptr
hash function
- Bumped version of half library to 5.6.0
- Bumped CI to support ROCm 5.6
- Made building tests optional
- Replaced
np.bool
withbool
per NumPy request
- Removed int8x4 rocBlas calls due to deprecation
- Removed
std::reduce
usage because not all operating systems support it
- Y-Model feature will store tuning information with the optimized model
- Added Python 3.10 bindings
- Accuracy checker tool based on ONNX runtime
- ONNX operators parse_split, and Trilu
- Build support for ROCm MLIR
- Added the
migraphx-driver
flag to print optimizations in Python (--python) - Added JIT implementation of the Gather and Pad operators, which results in better handling for larger tensor sizes
- Improved performance of Transformer-based models
- Improved performance of the
Pad
,Concat
,Gather
, andPointwise
operators - Improved ONNX/pb file loading speed
- Added a general optimize pass that runs several passes, such as
simplify_reshapes
, algebra, and DCE in a loop
- Improved parsing for TensorFlow Protobuf files
- Resolved various accuracy issues with some ONNX models
- Resolved a gcc-12 issue with MIVisionX
- Improved support for larger sized models and batches
- Use
--offload-arch
instead of--cuda-gpu-arch
for the HIP compiler - Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow
- Changes inside JIT to temporarily use cosine to compute sine function
- Changed version and location of third-party build dependencies in order to pick up fixes