Releases: openucx/ucx
Releases · openucx/ucx
v1.15.0
1.15.0 (September 28, 2023)
Features:
UCP
- Added 2-stage pipeline protocol in the new protocol infrastructure
- Added reset and abort functionality of rendezvous protocols in the new infrastructure
- Added zero-copy rendezvous data send protocol in the new infrastructure
- Added support for user memory handle in the new protocol infrastructure
- Added option to force ODP registration for certain memory types
- Enabled lock free memory region deregistration
- Updated allow/deny transport list feature to control auxiliary transport selection
- Multiple performance improvements of the new protocol infrastructure
- Multiple improvements in error and debug messages
UCT
- Split UCT_MD_MKEY_PACK_FLAG_INVALIDATE into two flags for RMA and AMO
- Added put_zcopy and get_zcopy scheme support for self transport
- Added base implementation of is_reachable_v2 API using intra/inter flag
- Introduced MD capability for non-blocking registration memory types
RDMA CORE (IB, ROCE, etc.)
- Added implementation of is_reachable_v2 routine to IB interface
- Added option to control CQE zipping per CQ RX/TX direction
- Added option to specify how DCI selects port under RoCE LAG
- Added hw_dcs to the list of policies to select DCI by an endpoint
- Removed implicit on-demand paging
- Added option to set RoCE lag dct port for response under queue affinity mode
- Improved IB memlock limit logging
UCS
- Added ucs_string_buffer_rbrk() to split token
GPU (CUDA, ROCM)
- Added support for atomic reply_buffer on GPU memory
- Added system device information for AMD GPUs
- Improved performance estimation of gdr_copy transport
- Added a simplistic implementation of performance estimation of cuda_ipc transport
- Improved performance estimation of cuda_ipc on Hopper architecture
- Added rcache parameters for rocm transports
- Introduced dmabuf support for rocm transports
- Implemented asynchronous progress for the zcopy operations in the rocm_copy transport
- Added option to enable using cross-device dmabuf file descriptor for rocm
Java
- Added Java bindings for exported memh feature
Tests
- Added a rocm docker container for testing
- Added option to send client_id in iodemo test
- Added support for multiple connections to the same server in iodemo test
- Added synchronization before exit to hello world examples
Tools
- Added user-side memcpy option for AM benchmarks in ucx_perftest
- Added wireshark LUA dissectors for some UCX protocols
Build
- Added support for binutils 2.40
- Added versioned dependency to switch between packages with the same names
- Added a separate xpmem deb subpackage
- Added aarch64 support to the binary distribution pipeline
- Removed dependency on libnuma
Bugfixes:
UCP
- Fixed assertion when sending from non-contiguous GPU buffer to managed buffer
- Fixed the race condition on endpoint configurations
- Fixed endpoint reconfiguration issues due to asymmetrical selection
- Fixed endpoint reconfiguration error due to wrong locality detection
- Fixed crash during connection manager cleanup
- Fixed rkey index calculation for rendezvous protocol
- Fixed rcache dump function
- Removed logging from rkey unpack in release mode
- Fixed dobule free of rkey in rendezvous protocol
- Fixed rendezvous pipeline protocol error flow
- Fixed error handling in rendezvous get zcopy protocol
- Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration
- Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not
- Avoid memory registration during UCP context initialization
- Fixed CPU/device atomics selection in the new protocol infrastructure
- Multiple fixes in the new protocol infrastructure information output
UCT
- Added check for dmabuf kernel support in ROCm memory domain
- Fixed exported memh packing
- Fixed an error in checking return status of multi-threaded memory registration function
RDMA CORE (IB, ROCE, etc.)
- Fixed dma-buf based memory region registration
- Fixed memory handle data corruption when PCIe relaxed ordering is enabled
- Fixed performance degradation when indirect atomic key is not supported by the hardware
- Fixed remote access error to strict-order keys because of wrong offset
- Added check for UAR support to memory domain opening
- Fixed updating port counters for devx qp
- Fixed ibv_create_cq error message on node without Infiniband
- Fixed performance degradation due to using 2 paths on NDR400 by default
- Removed unnecessary async lock which otherwise would block UD progress
GPU (CUDA, ROCM)
- Fixed CUDA IPC performance degradation due to libnuma removal
UCS
- Fixed lane selection and added bandwidth estimation for Sapphire Rapids family
- Fixed displaying wrong environment variable suggestions
- Fixed VFS warning output
- Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation
- Fixed memory corruption when using UCX_MPOOL_FIFO=y
UCM
- Fixed conditional jump patching
- Fixed mremap() override
GPU (CUDA, ROCM)
- Fixed usage of dmabuf when the buffer is not page-aligned
- Removed async_cb from cuda_copy to avoid the issue with UCP worker async lock
Java
- Fixed leakage of jucx_request global references
Documentation
- Updated ucp_worker_release_address description
Tests
- Fixed wrong usage of ep_close in examples
Tools
- Fixed memory access flags in perftest
- Removed support for librte from perf
- Fixed worker flush deadlock when using multiple workers in ucx_perftest
Build
- Changed 'unsupported option' ICC command line warning to error
- Removed never used fault-injection configuration option
- Fixed obsolete macro warnings in new autoconf/libtool
- Fixed building UCX with GCC 13
- Fixed UCX RPM build on machines that have libxpmem-devel rpm from MLNX_OFED installation
- Fixed ucx-rdmacm package requirements
- Fixed compilation errors with armcc-22.1
- Fixed passing port number to goperftest
v1.15.0 RC6
1.15.0 RC6 (September 20, 2023)
Bugfixes:
UCP
- Fixed assertion when sending from noncontig GPU buffer to managed buffer.
v1.15.0 RC5
1.15.0 RC5 (September 12, 2023)
Bugfixes:
UCP
- Fixed the data race on endpoint configurations.
v1.15.0 RC4
1.15.0 RC4 (August 30, 2023)
Bugfixes:
RDMA CORE (IB, ROCE, etc.)
- Fixed dma-buf based memory region registration
- Fixed memory handle data corruption when PCIe relaxed ordering is enabled
UCS
- Fixed lane selection, adding bandwidth estimation for Sapphire Rapids family
v1.15.0 RC3
1.15.0 RC3 (August 8, 2023)
Bugfixes:
UCP
- Fixed endpoint reconfiguration issues because of asymmetrical selection
UCT
- Check dmabuf kernel support in ROCm memory domain
UCM
- Fixed conditional jump patching
Tools
- Fixed memory access flags in perftest
v1.15.0 RC2
1.15.0 RC2 (July 27, 2023)
Features:
RDMA CORE (IB, ROCE, etc.)
- Implemented is_reachable_v2 for IB interfaces
Build
- Enabled build with binutils 2.40
- Added versioned dependency to switch between packages with the same names
Bugfixes:
UCP
- Fixed endpoint reconfiguration error due to wrong locality detection
RDMA CORE (IB, ROCE, etc.)
- Fixed performance degradation when indirect atomic key is not supported by the hardware
- Fixed remote access error to strict-order key because of wrong offset
GPU (CUDA, ROCM)
- Fixed CUDA IPC performance degradation after libnuma removal
v1.14.1
1.14.1 (May 22, 2023)
Bugfixes:
- Fixed ROCm to prevent the locking of host pinned memory
- Added CUDA 12 based UCX builds to the release flow
- Increased the maximal number of endpoint configurations
- Fixed filter for a slow-lanes in selection logic
- Fixed TCP transport bandwidth calculation
- Fixed device detection for ROCM
- Fixed compatibility with CUDA 12
- Fixed rendezvous threshold for multi-path configurations
- Fixed error message in case of static link
- Fixed BlueField-3 detection
- Multiple fixes for Azure CI pipeline
v1.14.1-rc3
1.14.1 RC3 (May 19, 2023)
- Fixed ROCm to prevent the locking of host pinned memory
v1.14.1-rc2
1.14.1 RC2 (May 18, 2023)
Bugfixes
- Added CUDA 12 based UCX builds to the release flow
v1.14.1-rc1
1.14.1 RC1 (May 17, 2023)
Bugfixes
- Increased the maximal number of endpoint configurations
- Fixed filter for a slow-lanes in selection logic
- Fixed TCP transport bandwidth calculation
- Fixed device detection for ROCM
- Fixed compatibility with CUDA 12
- Fixed rendezvous threshold for multi-path configurations
- Fixed error message in case of static link
- Fixed BlueField-3 detection
- Multiple fixes for Azure CI pipeline