Releases · JuliaGPU/CUDA.jl

Allow sorting of tuples of numbers (#1196) (@mcabbott)
Use === for generic atomic updates with compare-and-swap (#1300) (@guyvdbroeck)
Update manifest (#1302) (@github-actions[bot])
Store the array length next to its dimensions. (#1303) (@maleadt)
Disallow calling CUDA device array intrinsics on the host. (#1305) (@maleadt)
Support logical indexing with CPU sources. (#1306) (@maleadt)
Activate a context when calling device!. (#1307) (@maleadt)

Contributors

guyvdbroeck, maleadt, and mcabbott

Assets 2

29 Dec 09:13

github-actions

v3.6.2

fff6087

v3.6.2

CUDA v3.6.2

Diff since v3.6.1

Closed issues:

Norm of complex-typed CuArray is not real (#1290)
Calling @show on Symmetric of a CuArray triggers Scalar Indexing (#1294)
CUSPARSE Error when solving a linear system (#1296)

Merged pull requests:

Correctly handle missing cached_memory. (#1295) (@maleadt)
Update manifest (#1297) (@github-actions[bot])

Contributors

maleadt

Assets 2

23 Dec 11:56

github-actions

v3.6.1

7778e34

v3.6.1

CUDA v3.6.1

Diff since v3.6.0

Closed issues:

reduce_block error on Complex type (#1289)
cudnn_cnn_infer64_8 could not be laoded (#1291)
Support to find the first k eigenvalues of a sparse matrix (#1292)

Merged pull requests:

Bump CUDNN artifacts (#1293) (@maleadt)

Contributors

maleadt

Assets 2

22 Dec 07:02

github-actions

v3.6.0

2a6bfa6

v3.6.0

CUDA v3.6.0

Diff since v3.5.0

Closed issues:

Conversion issue (#157)
Extend new RNG to Complex numbers & normal distributions (#726)
Fatal errors during sorting tests (#916)
deepcopy failing (#1202)
Kernel compilation fails when specifying shared memory array size as a tuple consisting of block dimension and kernel argument (#1205)
ERROR: LoadError: The artifact at C:\Users\name.julia\artifacts\58bd87695e9ccdb508cb38be1ab717315ecc9152 is empty. (#1209)
InvalidIRError when displaying a model which is on the GPU (#1212)
CUDA.jl tries to load CUDA compat loaded via jll even though system package is installed (#1216)
Synchronizing over blocks (#1220)
assignment changes random seed (#1226)
accumulate gives wrong answer when init != 0 (#1227)
Generic dot kernel: use multiple kernels instead of atomics (#1244)
integer division error creating CuVector of missing and nothing (#1251)
unsupported dynamic function invocation with union type of more than 2 elements (#1252)
three CUDA.@atomic in a row result in out-of-bounds error (#1254)
Float16 CAS cannot use atom.cas.b16.global on sm_61 (#1258)
cu(::SVector) gives SVector, cu(::MVector) gives CuArray (#1262)
Get back unsafe_copyto!methods for unified<-unified and unified<->device (#1263)
Passing and using a FFT plan in a CUDA kernel seems impossible (#1266)
Inplace Complex FFT and Threads (#1268)
sort returns nothing (#1270)
Release a new version (#1276)
__init_driver__ not called in 3.5 (#1280)
Shared memory does not support isbits unions. (#1281)
NVIDIA Nsight Systems and CUDA.@profile error (#1282)
nvprof with using CUDA crashes julia (#1283)

Merged pull requests:

Addition over CuSparseMatrix (#1195) (@yuehhua)
[CUSOLVER] Add ordering functions (#1198) (@amontoison)
Correctly handle multi-GPU instances with NVML. (#1199) (@maleadt)
CI improvements. (#1200) (@maleadt)
fix FFT workarea typo leading to memory corruption (#1204) (@marius311)
Update manifest (#1206) (@github-actions[bot])
Minor improvements for library wrappers (#1207) (@maleadt)
Various small improvements (#1210) (@maleadt)
Extend CuDeviceArray ctors for mixed-int indices. (#1211) (@maleadt)
Deprecate non-blocking sync, and always call the synchronization API. (#1213) (@maleadt)
Generic CUSPARSE: use the index arguments. (#1214) (@maleadt)
Add bitonic sort implementation (#1217) (@xaellison)
Update manifest (#1218) (@github-actions[bot])
Reverted deepcopy, added test (#1221) (@birkmichael)
Use broadcast instead of copies to initialize mapreduce buffers. (#1223) (@maleadt)
Remove some unneeded Base module prefixes. (#1224) (@maleadt)
Update manifest (#1225) (@github-actions[bot])
Cherry-picked improvements (#1228) (@maleadt)
Update introduction.jl (#1232) (@aramirezreyes)
Update manifest (#1233) (@github-actions[bot])
Fix SpMV for CUDA 11.5 (#1234) (@amontoison)
Add support for randn and randexp. (#1236) (@maleadt)
Avoid double-initializing partial accumulate results. (#1237) (@maleadt)
Fix cuTENSOR contractions not working for FP16 inputs (#1238) (@thomasfaingnaert)
Bump CUTENSOR and fix on CUDA 11.5 (#1239) (@maleadt)
Support dot product on GPU between CuArrays with inconsistent eltypes (#1240) (@findmyway)
Update manifest (#1241) (@github-actions[bot])
Optimize CUTENSOR contraction. (#1243) (@maleadt)
Don't use nondeterministic atomics in dot when requested. (#1245) (@maleadt)
Remove CUBLAS decomposition tests without pivoting. (#1246) (@maleadt)
Update manifest (#1247) (@github-actions[bot])
wrap CUBLAS spmv and spr (#1248) (@bjarthur)
CompatHelper: bump compat for "SpecialFunctions" to "2" (#1249) (@github-actions[bot])
Update manifest (#1250) (@github-actions[bot])
Store array offset as elements to fix all-singleton case. (#1255) (@maleadt)
Update CUDA to 11.5 Update 1. (#1256) (@maleadt)
Use Base functionality for iteration Union type components. (#1257) (@maleadt)
Bump CI to Julia 1.7. (#1260) (@maleadt)
Update manifest (#1261) (@github-actions[bot])
Use CUDA APIs for unoptimized copies. (#1265) (@maleadt)
Bump CUDNN to 8.3.1, enable CUDA 11.5 by default. (#1267) (@maleadt)
Adding stream update for inplace complex FFT (#1269) (@ovanvincq)
Fix sort! return type. (#1272) (@maleadt)
Add const keyword to type aliases declarations. (#1273) (@eliascarv)
Update manifest (#1274) (@github-actions[bot])
Avoid eager expansion of CUDA_compat artifact string. (#1275) (@maleadt)
Allow copies between unified arrays in different contexts. (#1277) (@maleadt)
fix zeros and ones for user defined types (#1278) (@GiggleLiu)
Make CUDNN depend on CUBLAS. (#1279) (@maleadt)
Update manifest (#1286) (@github-actions[bot])
Restore call to init_driver. (#1287) (@maleadt)
Improvements for isbits union shared memory (#1288) (@maleadt)

Contributors

maleadt, marius311, and 12 other contributors

Assets 2

11 Oct 06:03

github-actions

v3.5.0

ca9034d

v3.5.0

CUDA v3.5.0

Diff since v3.4.2

Closed issues:

Illegal memory access on 3.3 (#975)
Forward compatibility (#1071)
ambiguous sparse constructor (#1088)
Map reduce with float 16 (#1124)
Allow invalid GPU pointers not allowed in unsafe_wrap (#1125)
Scalar Indexing error in the Introduction docs (#1127)
stackoverflow when printing a custom subtype of AbstractCuSparseMatrix (#1128)
missing rand methods (#1138)
Error mapreducing over a 0 dimensional array (#1141)
seed! is not thread safe (#1158)
Simplify Int32-based indices (#1160)
Concatenating a scalar to a CuArray gives an Array (#1162)
Calling byte_perm with Int32 values inserts sign checks (#1165)
sum! does not compile for large arrays (#1169)
Same random sequence on GPU and CPU? (#1170)
Specifying eltype and buffer type when adapting to CuArray? (#1171)
Inefficient lop3.lut instructions generated (#1172)
Writing temporary PTX files can fail (#1173)
Switching devices doesn't switch the REPL's output task (#1175)
GC is not working for CuSparseMatrixCSR (#1178)
sparse*dense operations shouldn't drop sparseness (#1188)
Raises illegal memory access error randomly (#1189)

Merged pull requests:

CI fixes (#950) (@maleadt)
implement sparse (#1093) (@CarloLucibello)
Use the kernel state object to pass the exception flag location. (#1110) (@maleadt)
Update manifest (#1123) (@github-actions[bot])
Improve show methods in sparse GPU arrays. (#1129) (@maleadt)
Use warp intrinsics for a wider range of reductions. (#1130) (@maleadt)
Support wrapping a host buffer with a CuArray (#1131) (@maleadt)
support transpose CSC to CUDA CSR (#1132) (@Roger-luo)
Small improvements to discovery of local toolkits. (#1134) (@maleadt)
Rework device and context getters. (#1135) (@maleadt)
Avoid memory operations during graph capture. (#1137) (@maleadt)
Streamline the random number interface. (#1146) (@maleadt)
Native device synchronization (#1147) (@maleadt)
support interpret(reshape) (#1149) (@Roger-luo)
add a gitignore (#1150) (@Roger-luo)
Fix normalize on complex number (#1151) (@maleadt)
Addition and multiplication over cuarray and cusparse (#1152) (@maleadt)
Preserve Int32 hardware indices (#1153) (@maleadt)
remove mutable to make device sparse type bitstype (#1154) (@Roger-luo)
Update manifest (#1155) (@github-actions[bot])
CompatHelper: bump compat for "BFloat16s" to "0.2" (#1156) (@github-actions[bot])
Perform actual synchronization API calls when we need the memory (#1157) (@maleadt)
Binary dependency changes (#1159) (@maleadt)
Bump dependencies. (#1161) (@maleadt)
Generalize Sparse Array Indices Type in Struct Def (#1163) (@Roger-luo)
Use unchecked type conversions for byte_perm arguments (#1166) (@eschnett)
Fix performance regressions (#1167) (@maleadt)
Fix big mapreduce kernel for inputs without neutral element. (#1174) (@maleadt)
Switch contexts before performing memory operations on arrays (#1176) (@maleadt)
Improvements to stream-ordered memory management (#1177) (@maleadt)
Update manifest (#1180) (@github-actions[bot])
Consistently use chars instead of raw enums in CUSPARSE/CUSOLVER functions. (#1181) (@maleadt)
Implement forward compatibility (#1182) (@maleadt)
Bump GPUCompiler for 1.8 compat. (#1183) (@maleadt)
Bump GPUArrays. (#1186) (@maleadt)
Update documentation (#1187) (@maleadt)

Contributors

eschnett, maleadt, and 2 other contributors

Assets 2

27 Aug 20:25

github-actions

v3.4.2

be43480

v3.4.2

CUDA v3.4.2

Diff since v3.4.1

Closed issues:

Broadcasting a datatype does not work (#261)
CUDA error: invalid argument during Zygote/Flux gradient computation (#1107)
EXCEPTION_ACCESS_VIOLATION when using shared memory allocations. (#1116)

Merged pull requests:

add symmetric support for mul (#217) (@Roger-luo)
adds a device array type for CuSparseMatrixCSR to support using it in kernel functions (#1106) (@Roger-luo)
Update manifest (#1108) (@github-actions[bot])
Specialize Ref{<:Type} for GPU compatibility. (#1109) (@maleadt)
Use the documented version of the enable_finalizers API. (#1111) (@maleadt)
Don't embed the method table in the AST. (#1112) (@maleadt)
Remove the hacky unique'ing of shmem GVs. (#1114) (@maleadt)
Introduce a macro for marking multiple functions as device-only. (#1117) (@maleadt)
Simplify library loading. (#1121) (@maleadt)
Backports for 3.4.2 (#1122) (@maleadt)

Contributors

maleadt and Roger-luo

Assets 2

17 Aug 15:02

github-actions

v3.4.1

c3ce593

v3.4.1

CUDA v3.4.1

Diff since v3.4.0

Closed issues:

cudnnFindConvolutionAlgorithmWorkspaceSize uses removed function cached_memory (#1101)

Merged pull requests:

Update manifest (#1102) (@github-actions[bot])
Release hotfixes (#1103) (@maleadt)
Reverse CI for NNlibCUDA.jl (#1104) (@maleadt)

Contributors

maleadt

Assets 2

13 Aug 19:05

github-actions

v3.4.0

6758fca

v3.4.0

CUDA v3.4.0

Diff since v3.3.6

Merged pull requests:

Update GPUArrays and GPUCompiler. (#1100) (@maleadt)

Contributors

maleadt

Assets 2

13 Aug 15:37

github-actions

v3.3.6

964893c

v3.3.6

CUDA v3.3.6

Diff since v3.3.5

Closed issues:

LinearAlgebra.mul! with scalar arguments triggers scalar iteration (#790)
Kernel fails if input is struct with function (#1094)
cusparse: sparse matrix - matrix multiplication broken with transpose operation (#1095)

Merged pull requests:

lib cusparse: fix #1095 (broken sparse matrix-matrix multiplication with transpose operation) (#1096) (@frapac)
Only export the atomic macro on 1.6. (#1097) (@maleadt)
Support more inplace atomic operations. (#1098) (@maleadt)
Backports for 3.3.6 (#1099) (@maleadt)

Contributors

maleadt and frapac

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v3.6.4

Contributors

CUDA v3.6.3

Contributors

CUDA v3.6.2

Contributors

CUDA v3.6.1

Contributors

CUDA v3.6.0

Contributors

CUDA v3.5.0

Contributors

CUDA v3.4.2

Contributors

CUDA v3.4.1

Contributors

CUDA v3.4.0

Contributors

CUDA v3.3.6

Contributors

Releases: JuliaGPU/CUDA.jl

v3.6.4

CUDA v3.6.4

Contributors

v3.6.3

CUDA v3.6.3

Contributors

v3.6.2

CUDA v3.6.2

Contributors

v3.6.1

CUDA v3.6.1

Contributors

v3.6.0

CUDA v3.6.0

Contributors

v3.5.0

CUDA v3.5.0

Contributors

v3.4.2

CUDA v3.4.2

Contributors

v3.4.1

CUDA v3.4.1

Contributors

v3.4.0

CUDA v3.4.0

Contributors

v3.3.6

CUDA v3.3.6

Contributors