Skip to content

UCC Version 1.1.0

Compare
Choose a tag to compare
@manjugv manjugv released this 07 Oct 14:02
cd3fce9

Features

API

  • Added float 128 and float 32, 64, 128 (complex) data types
  • Added Active Sets based collectives to support dynamic groups as well as
    point-to-point messaging
  • Added ucc_team_get_attr interface

Core

  • Config file support
  • Fixed component search

CL

  • Added split rail allreduce collective implementation
  • Enable hierarchical alltoallv and barrier
  • Fixed cleanup bugs

TL

  • Added SELF TL supporting team size one

UCP

  • Added service broadcast
  • Added reduce_scatterv ring algorithm
  • Added k-nomial based gather collective implementation
  • Added one-sided get based algorithms

SHARP

  • Fixed SHARP OOB
  • Added SHARP broadcast

GPU Collectives (CUDA, NCCL TL and RCCL TL)

  • Added RCCL TL to support RCCL collectives
  • Added support for CUDA TL (intranode collectives for NVIDIA GPUs)
  • Added multiring allgatherv, alltoall, reduce-scatter, and reduce-scatterv
    multiring in CUDA TL
  • Added topo based ring construction in CUDA TL to maximize bandwidth
  • Added NCCL gather, scatter and its vector variant
  • Enable using multiple streams for collectives
  • Added support for RCCL gather (v), scatter (v), broadcast, allgather (v),
    barrier, alltoall (v) and all reduce collectives
  • Added ROCm memory component
  • Adapted all GPU collectives to executor design

Tests

  • Added tests for triggered collectives in perftests
  • Fixed bugs in multi-threading tests

Utils

  • Added CPU model and vendor detection
  • Several bug fixes in all components