Release v3.12.1 · JuliaGPU/CUDA.jl

CUDA v3.12.1

Diff since v3.12.0

Closed issues:

Accumulate doesn't work on >=4 dim Arrays with dims <= ndims(A) - 3 (#1039)
CUSPARSE does not support dense-sparse matrix multiplication (#1403)
Scalar indexing when comparing a CuArray to the identity matrix (#1557)
CUBLAS_STATUS_NOT_INITIALIZED (#1567)
LinearAlgebra./ and LinearAlgebra.\ breaks CuArray (#1568)
Window size in grid-stride loop (#1573)
Matrix multiplication works for primitive and non-primitive custom number types on the CPU, but it fails for primitive custom number types on the GPU. (#1574)
CuIterator doesn't specify IteratorSize but has no length() (#1583)
Garbage collection doesn't work as shown in the documentation (#1586)
Adding sparse adjoint results in kernel error (#1591)
sparse - sparse matrix multiplication partially missing (#1599)
FastMath sincos(), cis(), exp(im..) aren't as fast as C++ (#1606)
wrong type in wrapper of a cusolver function (#1621)
Adding CUDNN support for 3D convolutions/cross-correlations (#1631)
copyto! does not work between a CuArray and a view(Array) (#1634)
Minor issue with sparse function (#1641)
Scalar indexing when displaying Diagonal{Int64, CuSparseVector{Int64, Int32}} (#1645)
Many errors running test suite on GTX 960 4GB (#1650)
Driver discovery broken on platforms without compat driver (#1653)
Aliasing/Polluted Result from rfftplan for Float32 2^n 3D array (#1656)
Re-instate memory limit (#1670)
Split libnvToolsExt from CUDA_Runtime_jll? (#1672)
accumulate(op, a) causes scalar indexing (#1680)
CUSPARSE CI failures (#1692)
axpy! for nested base types (reshapedarray/adjoint/view) (#1696)
copyto! between a PermutedDimsArray view and a CuArray doesn't work (#1697)
WMMA test failure (#1700)
UndefVarError when a binary is not found (#1701)
Is CUSPARSELT supported? (#1702)
Best practices to reduce startup time (#1707)
1.9 compatibility (#1710)
WARNING: unused variadic paramters. (#1712)

Merged pull requests:

Remove/rework CuDeviceArray constructors (#1308) (@maleadt)
Add always_inline kernel parameter (#1554) (@lcw)
Update manifest (#1564) (@github-actions[bot])
Update manifest (#1569) (@github-actions[bot])
Update manifest (#1571) (@github-actions[bot])
Fix native RNG window calculation. (#1575) (@maleadt)
Use Base.active_project. (#1576) (@maleadt)
Fixes for and tests using JET. (#1577) (@maleadt)
Update manifest (#1578) (@github-actions[bot])
Docs, remove global variables in intro benchmark (#1580) (@SteffenPL)
Update manifest (#1581) (@github-actions[bot])
Update manifest (#1582) (@github-actions[bot])
Bugfixes when using \ operator with non square matrices (#1584) (@GVigne)
remove unbound type parameters (#1585) (@nsajko)
added --openacc-profiling off to the nvprof (#1587) (@mbeltagy)
Update manifest (#1588) (@github-actions[bot])
Wrap at-cuda's code in a let block. (#1589) (@maleadt)
Revert: Use JET during test suite. (#1590) (@maleadt)
[CUSPARSE] Update mv! and mm! functions for CuSparseMatrixCOO and CuSparseMatrixCSC (#1592) (@amontoison)
[CUSPARSE] Add sv! and sm! routines (#1593) (@amontoison)
CompatHelper: bump compat for "BFloat16s" to "0.3" (#1594) (@github-actions[bot])
Update wrap.jl (#1595) (@amontoison)
Provide more useful explanation why an eltype is unsupported. (#1596) (@maleadt)
CompatHelper: bump compat for "BFloat16s" to "0.4" (#1597) (@github-actions[bot])
Improve eltype error reporting. (#1598) (@maleadt)
Add () at the end of the library name in all ccall (#1600) (@amontoison)
Define length for CuIterator (#1602) (@mcabbott)
Added more sparse functions like: kron, tril, triu, reshape, adjoint, transpose, sparse-sparse multiplication (#1603) (@albertomercurio)
Fix rotate! and reflect! for the generic fallback in GPUArrays.jl (#1604) (@amontoison)
Update manifest (#1605) (@github-actions[bot])
Update manifest (#1609) (@github-actions[bot])
[CUSPARSE] Interface generic routines (#1611) (@amontoison)
[CUSPARSE] Update sparse-sparse GEMM (#1613) (@amontoison)
[CUSPARSE] Add sddmm! and gemvi! routines (#1615) (@amontoison)
Update manifest (#1616) (@github-actions[bot])
Don't use isbitsunion to support structs of union types. (#1617) (@maleadt)
Update CUDA driver compatibility package to 11.8. (#1618) (@maleadt)
Update CUDA artifacts to 11.7 Update 1. (#1619) (@maleadt)
Update to CUDA 11.8 (#1620) (@maleadt)
Update to CUDNN 8.6. (#1622) (@maleadt)
Move CUDNN and CUTENSOR into separate packages (#1624) (@maleadt)
Bump BFloat16s. (#1625) (@maleadt)
fix #1621 (#1626) (@jemiryguo)
Restore functionality of FastMath.sincos. (#1627) (@maleadt)
Update manifest (#1628) (@github-actions[bot])
Switch from manual artifact handling to automated JLLs (#1629) (@maleadt)
[CUSPARSE] Add CuMatrix * CuSparseMatrix products (#1632) (@amontoison)
Silence some test warnings. (#1635) (@maleadt)
Update CUTENSOR to v1.6 (#1636) (@maleadt)
[CUSPARSE] Add SparseMatrix * SparseVector products (#1637) (@amontoison)
Upgrade CUSTATEVEC to v1.1 (#1638) (@maleadt)
Upgrade CUTENSORNET to v1.1 (#1639) (@maleadt)
[CUSPARSE] Add CuSparseVector ± CuSparseVector (#1640) (@amontoison)
CompatHelper: add new compat entry for "Preferences" at version "1" (#1642) (@github-actions[bot])
Fix #1641 (#1643) (@amontoison)
Update manifest (#1646) (@github-actions[bot])
[CUSPARSE] Add dot(CuSparseVector,CuVector) and vice-versa (#1647) (@amontoison)
[CUSPARSE] Add ldiv! for CuSparseMatrixCOO and geam for CuSparseMatrixCSC (#1648) (@amontoison)
Update autogenerated headers (#1649) (@maleadt)
Remove deprecations (#1651) (@maleadt)
Don't warn about the old JULIA_CUDA_USE_BINARYBUILDER env var when using preferences (#1652) (@maleadt)
Update CUTENSORNET to use new slice group (#1654) (@kshyatt)
[CUSPARSE] Fix conversions between CuSparseMatrixCOO and CuSparseMatrixCSC (#1655) (@amontoison)
Include compiler options in error log. (#1657) (@maleadt)
Discover the system driver when CUDA_Driver_jll isn't available. (#1658) (@maleadt)
Preserve buffer type when adapting to CuArray. (#1659) (@maleadt)
Update manifest (#1661) (@github-actions[bot])
Extend conversion of QRPackedQ object to CuArray (#1662) (@GVigne)
[CUSPARSE] Add CuSparseMatrixCSC * CuSparseMatrixCSC (#1663) (@amontoison)
Update manifest (#1665) (@github-actions[bot])
[CUSPARSE] Add more tests (#1668) (@amontoison)
Update manifest (#1671) (@github-actions[bot])
Update manifest (#1676) (@github-actions[bot])
Fix eigen when using Hermitian or Symmetric matrices (#1677) (@GVigne)
Update manifest (#1679) (@github-actions[bot])
adding defaults for accumulate(op, a) with modified code from Base.accumulate (#1681) (@leios)
Add right division operator for Diagonal matrices (#1683) (@GVigne)
Update manifest (#1686) (@github-actions[bot])
Bump CUQUANTUM libraries (#1688) (@maleadt)
typo (#1689) (@ArnoStrouwen)
Retry CUSOLVER handle creation when encountering an internal error. (#1691) (@maleadt)
Fix #1692 (#1693) (@amontoison)
Update manifest (#1694) (@github-actions[bot])
[CUSPARSE] Support kron with Diagonal arguments (#1695) (@albertomercurio)
Re-introduce memory limits. (#1698) (@maleadt)
Adapt to GPUCompiler changes. (#1699) (@maleadt)
WMMA: Don't wrap fragments of size 1 in a struct. (#1704) (@maleadt)
Update manifest (#1708) (@github-actions[bot])
Use plain llvmcall calling convention for WMMA intrinsics. (#1709) (@maleadt)
Reclaim in cuDNN conv algorithm search (#1711) (@ToucheSir)
CUBLAS: test against generic axp(b)y, not the BLAS-specific one. (#1713) (@maleadt)
Fix LU getproperty invoke. (#1714) (@maleadt)
Backports for 3.12.1 (#1715) (@maleadt)
Specialize cholcopy to avoid scalar indexing. (#1716) (@maleadt)
Fix handling of inline-allocated structures with unions. (#1717) (@maleadt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.12.1

CUDA v3.12.1

Contributors