Skip to content

Releases: JuliaGPU/CUDA.jl

v5.5.2

26 Sep 05:51
a1db081
Compare
Choose a tag to compare

CUDA v5.5.2

Diff since v5.5.1

Merged pull requests:

v5.5.1

23 Sep 10:24
3b05baf
Compare
Choose a tag to compare

What's Changed

Full Changelog: v5.5.0...v5.5.1

v5.5.0

18 Sep 14:28
1fe8838
Compare
Choose a tag to compare

CUDA v5.5.0

Blog post

Diff since v5.4.3

Merged pull requests:

Closed issues:

  • LinearAlgebra.norm(x) falls back to generic implementation for x::Transpose and x::Adjoint (#1782)
  • dlclose'ing the compatibility driver can fail (#1848)
  • Creating a sparse diagonal matrix of CuArray(u) (#1857)
  • Support for Julia 1.11 (#2241)
  • CUDA 12.4 Update 1: CUPTI does not trace kernels anymore (#2328)
  • Adding CUDA to a PackageCompiler sysimage causes segfault (#2428)
  • Error using CUDA on Julia 1.10: Number of threads per block exceeds kernel limit (#2438)
  • Error when I load my model (#2439)
  • Driver JLL improvements (#2446)
  • Deadlock when callling CUDA.jl in an adopted thread while blocking the main thread (#2449)
  • CUDA.Mem.unregister fails with CUDA.jl 5.4 (not with 5.3) (#2452)
  • Segmentation Fault on Loading CUDA (#2453)
  • Invalid instruction error when using CUDA (#2454)
  • Missing adapt for sparse and CUDABackend (#2459)
  • CUDA precompile cannot find/load "cupti64_2024.2.1.dll" during precompilation (juliaup 1.10.4, Windows 11) (#2466)
  • Request: Option to disable the "full GC when under very high memory pressure". (#2467)
  • copyto! ambiguous (#2477)
  • NeuralODE training failed on GPU with Enzyme (#2478)
  • issue with atomic - when running standard test, @atomic modify expression missing field access (#2483)
  • Support for creating a CuSparseMatrixCSC from a CuSparseVector (#2484)
  • Issue with compiling CUDA and cuTENSOR using local libraries (#2486)
  • Memory Access error in sparse array constructor (#2494)
  • Forwards-compatible driver breaks CURAND (#2496)
  • CUDA 12.6 Update 1 (#2497)

v5.4.3

09 Jul 08:09
71311af
Compare
Choose a tag to compare

CUDA v5.4.3

Diff since v5.4.2

Merged pull requests:

Closed issues:

  • Legacy cuIpc* APIs incompatible with stream-ordered allocator (#1053)
  • Broadcasted multiplication with a rational doesn't work (#1926)
  • Incorrect grid size in kron (#2410)
  • GEMM of non-contiguous inputs should dispatch to fallback implementation (#2412)
  • Failure of Eigenvalue Decomposition for Large Matrices. (#2413)
  • CUDA_Driver_jll's lazy artifacts cause a precompilation-time warning (#2415)
  • Recurrence of integer overflow bug (#1880) for a large matrix (#2427)
  • CUDA kernel crash very occasionally when MPI.jl is just loaded. (#2429)
  • CUDA_Runtime_Discovery Did not find cupti on Arm system with nvhpc (#2433)
  • CUDA.jl won't install/run on Jetson Orin NX (#2435)

v5.4.2

29 May 07:35
7e6a57a
Compare
Choose a tag to compare

CUDA v5.4.2

Diff since v5.4.1

Merged pull requests:

v5.4.1

28 May 18:53
5bbd9a7
Compare
Choose a tag to compare

CUDA v5.4.1

Diff since v5.4.0

Merged pull requests:

v5.4.0

28 May 06:45
Compare
Choose a tag to compare

CUDA v5.4.0

Blog post

Diff since v5.3.5

Merged pull requests:

Closed issues:

  • CUTENSOR breaks after device_reset! (#2319)
  • cuBLASXt's xt_gemm! incompatible with stream-ordered allocated memory (#2320)
  • Add helper function to recompile CUDA stack (#2364)

v5.3.5

24 May 13:29
7232f85
Compare
Choose a tag to compare

CUDA v5.3.5

Diff since v5.3.4

Merged pull requests:

  • Avoid constructing MulAddMuls on Julia v1.12+ (#2277) (@dkarrasch)
  • CompatHelper: bump compat for LLVM to 7, (keep existing compat) (#2365) (@github-actions[bot])
  • Enzyme: allocation functions (#2386) (@wsmoses)
  • Tweaks to prevent context construction on some operations (#2387) (@maleadt)
  • Fixes for Julia 1.12 / LLVM 17 (#2390) (@maleadt)
  • CUBLAS: Make sure CUBLASLt wrappers use the correct library. (#2391) (@maleadt)
  • Backport: Enzyme allocation fns (#2393) (@wsmoses)

Closed issues:

  • Indexing a view uses scalar indexing (#1472)
  • EnzymeCore is an unconditional dependency. (#2380)
  • cuBLASLt wrappers ccall into cuBLAS (#2388)
  • generic_trimatmul! error (#2389)

v5.3.4

15 May 19:28
c373258
Compare
Choose a tag to compare

CUDA v5.3.4

Diff since v5.3.3

Merged pull requests:

Closed issues:

  • Native Softmax (#175)
  • CUSOLVER: support eigendecomposition (#173)
  • backslash with gpu matrices crashes julia (#161)
  • at-benchmark captures GPU arrays (#156)
  • Support kernels returning Union{} (#62)
  • mul! falls back to generic implementation (#148)
  • \ on qr factorization objects gives a method error (#138)
  • Compiler failure if dependent module only contains a japi1 function (#49)
  • copy!(dst, src) and copyto!(dst, src) are significantly slower and allocate more memory than copyto!(dest, do, src, so[, N]) (#126)
  • Calling Flux.gpu on a view dumps core (#125)
  • Creating CuArray{Tracker.TrackedReal{Float64},1} a few times causes segfaults (#121)
  • Guard against exceeding maximum kernel parameter size (#32)
  • Detect common API misuse in error handlers (#31)
  • rand and friends default to Float64 (#108)
  • \ does not work for least squares (#104)
  • ERROR_ILLEGAL_ADDRESS when broadcasting modular arithmetic (#94)
  • CuIterator assumes batches to consist of multiple arrays (#86)
  • Algebra with UniformScaling Uses Generic Fallback Scalar Indexing (#85)
  • Document (un)supported language features for kernel programming (#13)
  • Missing dispatch for indexing of reshaped arrays (#556)
  • Track array ownership to avoid illegal memory accesses (#763)
  • NVPTX i128 support broken on LLVM 11 / Julia 1.6 (#793)
  • Support for sm_80 cp.async: asynchronous on-device copies (#850)
  • Profiling Julia with Nsight Systems on Windows results in blank window (#862)
  • sort! and partialsort! are considerably slower than CPU versions (#937)
  • mul! does not dispatch on Adjoint (#1363)
  • Cross-device copy of wrapped arrays fails (#1377)
  • Memory allocation becomes very slow when reserved bytes is large (#1540)
  • Cannot reclaim GPU Memory; CUDA.reclaim() (#1562)
  • Add eigen for general purpose computation of eigenvectors/eigenvalues (#1572)
  • device_reset! does not seem to work anymore (#1579)
  • device-side rand() are not random between successive kernel launches (#1633)
  • Add EnzymeRules support for CUDA.jl (for forward mode here) (#1811)
  • cusparseSetStream_v2 not defined (#1820)
  • Feature request: Integrating the latest CUDA library "cuLitho" into CUDA.jl (#1821)
  • KernelAbstractions.jl-related issues (#1838)
  • lock failing in multithreaded plan_fft() (#1921)
  • CUSolver finalizer tries to take ReentrantLock (#1923)
  • Testsuite could be more careful about parallel testing (#2192)
  • Opportunistic GC collection (#2303)
  • Unable to use local CUDA runtime toolkit (#2367)
  • Enzyme prevents testing on 1.11 (#2376)

v5.3.3

27 Apr 10:11
Compare
Choose a tag to compare

CUDA v5.3.3

Diff since v5.3.2

Merged pull requests:

Closed issues:

  • Excessive allocations when running on multiple threads (#1429)
  • Fix and test multigpu support (#2218)
  • Bitonic sort exceeds launch resources (#2331)