Skip to content

Releases: JuliaGPU/CUDA.jl

v4.0.1

09 Feb 19:14
Compare
Choose a tag to compare

What's Changed

Full Changelog: v4.0.0...v4.0.1

v4.0.0

01 Feb 09:05
f85dd7b
Compare
Choose a tag to compare

CUDA v4.0.0

Diff since v3.13.1

Closed issues:

  • Missing implementation of right multiply for QR decomposition (#1738)
  • [CUSPARSE] Type error with mm! (#1743)

Merged pull requests:

v3.13.1

20 Jan 15:53
459b176
Compare
Choose a tag to compare

CUDA v3.13.1

Diff since v3.13.0

Closed issues:

  • CUDA.jl cuFFT underperforming against CuPy cuFFT (#1682)
  • Is block-spmm supported? (#1736)

Merged pull requests:

  • Introduce cuFFT plan cache; switch to auto-managed memory. (#1734) (@maleadt)
  • Stop pirating GPUArrays' RNG methods. (#1735) (@maleadt)

v3.12.2

20 Jan 14:40
6b12ece
Compare
Choose a tag to compare

CUDA v3.12.2

Diff since v3.12.1

Closed issues:

  • CUDA.jl cuFFT underperforming against CuPy cuFFT (#1682)
  • Error during CUDA test (#1718)
  • Kernel error from bad broadcast (should be regular error?) (#1720)
  • Freeze into StackOverflow when JULIA_DEBUG=CUDA set (#1721)
  • Use of linear operators in CUDA.jl (#1727)
  • Is block-spmm supported? (#1736)

Merged pull requests:

v3.13.0

19 Jan 17:00
1a52af1
Compare
Choose a tag to compare

CUDA v3.13.0

Diff since v3.12.1

Closed issues:

  • Error during CUDA test (#1718)
  • Kernel error from bad broadcast (should be regular error?) (#1720)
  • Freeze into StackOverflow when JULIA_DEBUG=CUDA set (#1721)
  • Use of linear operators in CUDA.jl (#1727)

Merged pull requests:

v3.12.1

06 Jan 07:34
b7bdc79
Compare
Choose a tag to compare

CUDA v3.12.1

Diff since v3.12.0

Closed issues:

  • Accumulate doesn't work on >=4 dim Arrays with dims <= ndims(A) - 3 (#1039)
  • CUSPARSE does not support dense-sparse matrix multiplication (#1403)
  • Scalar indexing when comparing a CuArray to the identity matrix (#1557)
  • CUBLAS_STATUS_NOT_INITIALIZED (#1567)
  • LinearAlgebra./ and LinearAlgebra.\ breaks CuArray (#1568)
  • Window size in grid-stride loop (#1573)
  • Matrix multiplication works for primitive and non-primitive custom number types on the CPU, but it fails for primitive custom number types on the GPU. (#1574)
  • CuIterator doesn't specify IteratorSize but has no length() (#1583)
  • Garbage collection doesn't work as shown in the documentation (#1586)
  • Adding sparse adjoint results in kernel error (#1591)
  • sparse - sparse matrix multiplication partially missing (#1599)
  • FastMath sincos(), cis(), exp(im..) aren't as fast as C++ (#1606)
  • wrong type in wrapper of a cusolver function (#1621)
  • Adding CUDNN support for 3D convolutions/cross-correlations (#1631)
  • copyto! does not work between a CuArray and a view(Array) (#1634)
  • Minor issue with sparse function (#1641)
  • Scalar indexing when displaying Diagonal{Int64, CuSparseVector{Int64, Int32}} (#1645)
  • Many errors running test suite on GTX 960 4GB (#1650)
  • Driver discovery broken on platforms without compat driver (#1653)
  • Aliasing/Polluted Result from rfftplan for Float32 2^n 3D array (#1656)
  • Re-instate memory limit (#1670)
  • Split libnvToolsExt from CUDA_Runtime_jll? (#1672)
  • accumulate(op, a) causes scalar indexing (#1680)
  • CUSPARSE CI failures (#1692)
  • axpy! for nested base types (reshapedarray/adjoint/view) (#1696)
  • copyto! between a PermutedDimsArray view and a CuArray doesn't work (#1697)
  • WMMA test failure (#1700)
  • UndefVarError when a binary is not found (#1701)
  • Is CUSPARSELT supported? (#1702)
  • Best practices to reduce startup time (#1707)
  • 1.9 compatibility (#1710)
  • WARNING: unused variadic paramters. (#1712)

Merged pull requests:

  • Remove/rework CuDeviceArray constructors (#1308) (@maleadt)
  • Add always_inline kernel parameter (#1554) (@lcw)
  • Update manifest (#1564) (@github-actions[bot])
  • Update manifest (#1569) (@github-actions[bot])
  • Update manifest (#1571) (@github-actions[bot])
  • Fix native RNG window calculation. (#1575) (@maleadt)
  • Use Base.active_project. (#1576) (@maleadt)
  • Fixes for and tests using JET. (#1577) (@maleadt)
  • Update manifest (#1578) (@github-actions[bot])
  • Docs, remove global variables in intro benchmark (#1580) (@SteffenPL)
  • Update manifest (#1581) (@github-actions[bot])
  • Update manifest (#1582) (@github-actions[bot])
  • Bugfixes when using \ operator with non square matrices (#1584) (@GVigne)
  • remove unbound type parameters (#1585) (@nsajko)
  • added --openacc-profiling off to the nvprof (#1587) (@mbeltagy)
  • Update manifest (#1588) (@github-actions[bot])
  • Wrap at-cuda's code in a let block. (#1589) (@maleadt)
  • Revert: Use JET during test suite. (#1590) (@maleadt)
  • [CUSPARSE] Update mv! and mm! functions for CuSparseMatrixCOO and CuSparseMatrixCSC (#1592) (@amontoison)
  • [CUSPARSE] Add sv! and sm! routines (#1593) (@amontoison)
  • CompatHelper: bump compat for "BFloat16s" to "0.3" (#1594) (@github-actions[bot])
  • Update wrap.jl (#1595) (@amontoison)
  • Provide more useful explanation why an eltype is unsupported. (#1596) (@maleadt)
  • CompatHelper: bump compat for "BFloat16s" to "0.4" (#1597) (@github-actions[bot])
  • Improve eltype error reporting. (#1598) (@maleadt)
  • Add () at the end of the library name in all ccall (#1600) (@amontoison)
  • Define length for CuIterator (#1602) (@mcabbott)
  • Added more sparse functions like: kron, tril, triu, reshape, adjoint, transpose, sparse-sparse multiplication (#1603) (@albertomercurio)
  • Fix rotate! and reflect! for the generic fallback in GPUArrays.jl (#1604) (@amontoison)
  • Update manifest (#1605) (@github-actions[bot])
  • Update manifest (#1609) (@github-actions[bot])
  • [CUSPARSE] Interface generic routines (#1611) (@amontoison)
  • [CUSPARSE] Update sparse-sparse GEMM (#1613) (@amontoison)
  • [CUSPARSE] Add sddmm! and gemvi! routines (#1615) (@amontoison)
  • Update manifest (#1616) (@github-actions[bot])
  • Don't use isbitsunion to support structs of union types. (#1617) (@maleadt)
  • Update CUDA driver compatibility package to 11.8. (#1618) (@maleadt)
  • Update CUDA artifacts to 11.7 Update 1. (#1619) (@maleadt)
  • Update to CUDA 11.8 (#1620) (@maleadt)
  • Update to CUDNN 8.6. (#1622) (@maleadt)
  • Move CUDNN and CUTENSOR into separate packages (#1624) (@maleadt)
  • Bump BFloat16s. (#1625) (@maleadt)
  • fix #1621 (#1626) (@jemiryguo)
  • Restore functionality of FastMath.sincos. (#1627) (@maleadt)
  • Update manifest (#1628) (@github-actions[bot])
  • Switch from manual artifact handling to automated JLLs (#1629) (@maleadt)
  • [CUSPARSE] Add CuMatrix * CuSparseMatrix products (#1632) (@amontoison)
  • Silence some test warnings. (#1635) (@maleadt)
  • Update CUTENSOR to v1.6 (#1636) (@maleadt)
  • [CUSPARSE] Add SparseMatrix * SparseVector products (#1637) (@amontoison)
  • Upgrade CUSTATEVEC to v1.1 (#1638) (@maleadt)
  • Upgrade CUTENSORNET to v1.1 (#1639) (@maleadt)
  • [CUSPARSE] Add CuSparseVector ± CuSparseVector (#1640) (@amontoison)
  • CompatHelper: add new compat entry for "Preferences" at version "1" (#1642) (@github-actions[bot])
  • Fix #1641 (#1643) (@amontoison)
  • Update manifest (#1646) (@github-actions[bot])
  • [CUSPARSE] Add dot(CuSparseVector,CuVector) and vice-versa (#1647) (@amontoison)
  • [CUSPARSE] Add ldiv! for CuSparseMatrixCOO and geam for CuSparseMatrixCSC (#1648) (@amontoison)
  • Update autogenerated headers (#1649) (@maleadt)
  • Remove deprecations (#1651) (@maleadt)
  • Don't warn about the old JULIA_CUDA_USE_BINARYBUILDER env var when using preferences (#1652) (@maleadt)
  • Update CUTENSORNET to use new slice group (#1654) (@kshyatt)
  • [CUSPARSE] Fix conversions between CuSparseMatrixCOO and CuSparseMatrixCSC (#1655) (@amontoison)
  • Include compiler options in error log. (#1657) (@maleadt)
  • Discover the system driver when CUDA_Driver_jll isn't available. (#1658) (@maleadt)
  • Preserve buffer type when adapting to CuArray. (#1659) (@maleadt)
  • Update manifest (#1661) (@github-actions[bot])
  • Extend conversion of QRPackedQ object to CuArray (#1662) (@GVigne)
  • [CUSPARSE] Add CuSparseMatrixCSC * CuSparseMatrixCSC (#1663) (@amontoison)
  • Update manifest (#1665) (@github-actions[bot])
  • [CUSPARSE] Add more tests (#1668) (@amontoison)
  • Update manifest (#1671) (@github-actions[bot])
  • Update manifest (#1676) (@github-actions[bot])
  • Fix eigen when using Hermitian or Symmetric matrices (#1677) (@GVigne)
  • Update manifest (#1679) (@github-actions[bot])
  • adding defaults for accumulate(op, a) with modified code from Base.accumulate (#1681) (@leios)
  • Add right division operator for Diagonal matrices (#1683) (@GVigne)
  • Update manifest (#1686) (@github-actions[bot])
  • Bump CUQUANTUM libraries (#1688) (@maleadt)
  • typo (#1689) (@ArnoStrouwen)
  • Retry CUSOLVER handle creation when encountering an internal error. (#1691) (@maleadt)
  • Fix #1692 (#1693) (@amontoison)
  • Update manifest (#1694) (@github-actions[bot])
  • [CUSPARSE] Support kron with Diagonal arguments (#1695) (@albertomercurio)
  • Re-introduce memory limits. (#1698) (@maleadt)
  • Adapt to GPUCompiler changes. (#1699) (@maleadt)
  • WMMA: Don't wrap fragments of size 1 in a struct. (#1704) (@maleadt)
  • Update manifest (#1708) (@github-actions[bot])
  • Use plain llvmcall calling convention for WMMA intrinsics. (#1709) (@maleadt)
  • Reclaim in cuDNN conv algorithm search (#1711) (@ToucheSir)
  • CUBLAS: test against generic axp(b)y, not the BLAS-specific one. (#1713) (@maleadt)
  • Fix LU getproperty invoke. (#1714) (@maleadt)
  • Backports for 3.12.1 (#1715) (@maleadt)
  • Specialize cholcopy to avoid scalar indexing. (#1716) (@maleadt)
  • Fix handling of inline-allocated structures with unions. (#1717) (@maleadt)

v3.12.0

16 Jul 21:40
3729010
Compare
Choose a tag to compare

CUDA v3.12.0

Diff since v3.11.0

Closed issues:

  • Implement Base.repeat (#177)
  • repeat performs scalar indexing for multi-dimensional arrays (#1051)
  • The GPU compiler fails on a call to maximum (#1548)
  • versioninfo triggers artifact downloads (#1549)
  • Error when broadcasting composed functions (#1550)
  • overload Base.copy! for AbstractGPUArray{<:Any,1} (#1555)

Merged pull requests:

  • Fix math quirk. (#1546) (@maleadt)
  • Wrap cusolverRf.h and cusolverSp_LOWLEVEL_PREVIEW.h (#1547) (@frapac)
  • Update manifest (#1551) (@github-actions[bot])
  • tighten unsafe_wrap signature on scalar length (#1552) (@sjkelly)
  • Update Documenter key. (#1553) (@maleadt)
  • Update manifest (#1556) (@github-actions[bot])
  • Import factorisation internal types from LinearAlgebra (#1558) (@theabhirath)
  • Update manifest (#1560) (@github-actions[bot])
  • add reshape for CuDeviceArray (#1561) (@omlins)

v3.11.0

15 Jun 10:29
15a0e1d
Compare
Choose a tag to compare

CUDA v3.11.0

Diff since v3.10.1

Closed issues:

  • CUSPARSE: Diagonal + CSC/CSR gives dense array (#1469)
  • CUBLAS: Multiplication of UpperTriangular/LowerTriangular not supported (#1486)
  • CUTENSOR tests consume lots of memory, breaking other tests (#1501)
  • CUFFT doesn't work for ComplexF64 C2C in-place (#1519)
  • Inconsistency of == and isequal for CuArray (#1524)
  • Setting CUDA seed the first time changes Random's RNG non-deterministically (#1526)
  • Undefined exported symbols (#1527)
  • Could not load library libLLVMExtra-14.dll (#1535)
  • Add an rrule for cholesky to CUDA.jl (#1541)

Merged pull requests:

  • specialize +/- op for sparse diag (#1514) (@Roger-luo)
  • Make sure instantiating RNGs doesn't affect the global CPU RNG. (#1530) (@maleadt)
  • Update manifest (#1531) (@github-actions[bot])
  • ldiv! for LU Decomposition (#1532) (@SBuercklin)
  • Lower dmax for contraction tests (#1534) (@kshyatt)
  • Fix convolution algorithm search (#1536) (@maxfreu)
  • Update manifest (#1537) (@github-actions[bot])
  • add specializations for some triangular-triangular multiplications (#1538) (@Red-Portal)
  • Add a utility to download artifacts without a functional driver. (#1539) (@maleadt)
  • Update manifest (#1543) (@github-actions[bot])
  • Explicit tests for type conversion (#1544) (@kshyatt)
  • Remove unused exports. (#1545) (@maleadt)

v3.10.1

27 May 20:19
49902d8
Compare
Choose a tag to compare

CUDA v3.10.1

Diff since v3.10.0

Closed issues:

  • Overflow in randn using CUDA.jl's native RNG (#1464)
  • Segmentation fault with pre-compiled library importing CUDA (#1465)
  • Julia freezes when using Polynomials with CuArray (#1497)
  • Launch overhead regression (#1503)
  • CUSOLVER: Matrix division requires identical types (#1512)
  • Incorrect distribution for complex standard normals when using CUDA.default_rng() (#1515)
  • loggamma (#1528)

Merged pull requests:

v3.10.0

16 May 18:58
044bd98
Compare
Choose a tag to compare

CUDA v3.10.0

Diff since v3.9.1

Closed issues:

  • Error while freeing DeviceBuffer-warning when using multiple GPUs (#1454)
  • CUDNN cache locking prevents finalizers resulting in OOMs (#1461)
  • EOFError from pool_cleanup when closing REPL (#1495)
  • TypeError in compiler with custom kernel (#1496)

Merged pull requests: