Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v1.3.2
v1.3.1
CUDA v1.3.1
Closed issues:
- Element-wise conversion fails (#378)
- atomic_min fails for Int32 in global CuDeviceArrays (#379)
- Segmentation fault from @cuprint on char (#381)
- error in versioninfo(), name not defined (#385)
Merged pull requests:
- Fix docs (#330) (@maleadt)
- Wrap cusparseSpMV (#351) (@marius311)
- specify Cchar rather than char in the doc for @cuprint (#382) (@MasonProtter)
- Adapt to LLVM.jl changes for stateless codegen. (#383) (@maleadt)
v1.3.0
CUDA v1.3.0
Closed issues:
- Trouble with the @. macro (#346)
- NVMLError: Not Supported (code 3) (#348)
- Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
- Could not load CUTENSOR artifact dll on Windows 10 (#355)
- CuTextureArray for 3D array (#357)
- Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
- Callback-based logging (#366)
- Artifact download timeout (#369)
sum!
accumulates when called multiple times (#370)- nvprof does not detect kernel launches (#371)
- KernelError: passing and using non-bitstype argument (#372)
- CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)
Merged pull requests:
- Make the memory allocator context-aware (#253) (@maleadt)
- Update manifest (#347) (@github-actions[bot])
- Guard against unsupported NVML usage in the test runner. (#352) (@maleadt)
- Bump CUDNN to v8.0.2 (#353) (@maleadt)
- Rework thread state management (#356) (@maleadt)
- Update manifest (#358) (@github-actions[bot])
- Memory allocator simplifications (#361) (@maleadt)
- Deduplicate code from memory pools (#362) (@maleadt)
- Fix show of ArrayBuffer. (#363) (@maleadt)
- Clean-up the Buffer interface. (#364) (@maleadt)
- Use callback APIs to get library debug logs. (#367) (@maleadt)
- Allow selecting the memcheck tool. (#368) (@maleadt)
- Update GPUArrays. (#373) (@maleadt)
- Update to CUDA 11.0 update 1 (#374) (@maleadt)
- Number and iterate devices in versioninfo() following CUDA. (#375) (@maleadt)
- Reinstate support for Julia 1.3 (#377) (@maleadt)
v1.2.1
CUDA v1.2.1
Closed issues:
- CuArrays.zeros(T, 0) fails (#81)
- CUDAnative.cos calls the base cos function in nested broadcast (#102)
- CuSparseMatrixHYB * CuMatrix = nothing (#256)
- Strange reordering of struct fields with dynamic parallelism (#263)
- Performance: bias add (#298)
- CUDA 11 libraries incorrectly looked up in artifact (#300)
- CUTENSOR for windows (#301)
- Performance: sum (#302)
- Performance: getindex(a, i::Array{Int}) (#303)
- Display for CuArray within Tuples does not respect :limit=>true (#305)
- Performance: elementwise operations (#307)
- Performance: perceptron (#312)
- windows install error: isfile(__libcupti[]) (#324)
- std with dims is not type stable (#336)
Merged pull requests:
- Re-enable threading tests. (#25) (@maleadt)
- Reorganize and simplify some includes (#296) (@maleadt)
- Only run benchmarks on the master branch. (#297) (@maleadt)
- Optimizations for broadcast (#299) (@maleadt)
- Update manifest (#304) (@github-actions[bot])
- Test runner improvements for multigpu mode (#309) (@maleadt)
- Artifact improvements for CUDA 11 on Windows (#310) (@maleadt)
- Optimize element-wise operations (#313) (@maleadt)
- Check if reported GPU memory use is available. (#314) (@maleadt)
- Update artifacts: include cusolverMg, and use Yggdrasil binaries. (#315) (@maleadt)
- Specialization fixes for mapreducedim. (#316) (@maleadt)
- Fix invalid conversion of pointer to signed integer. (#317) (@maleadt)
- Work around (presumed) Windows driver bug in exception test. (#319) (@maleadt)
- Update manifest (#323) (@github-actions[bot])
- Bump CUDNN and CUTENSOR (#325) (@maleadt)
- Simplify NVML discovery. (#326) (@maleadt)
- Separate CURAND wrappers from Random impl. (#327) (@maleadt)
- Simplify discovering binaries by using Sys.which. (#328) (@maleadt)
- Add wrapper for NVML utilization rates. (#329) (@maleadt)
- Attach CUSPARSE docstrings to bare methods, not empty functions. (#331) (@maleadt)
- Eagerly reduce the amount of worker threads. (#332) (@maleadt)
- Bump dependencies. (#333) (@maleadt)
- Clean-up library wrappers [NFC] (#334) (@maleadt)
- Fix CUDNN v8 discovery and loading on Windows (#335) (@maleadt)
- Fix type stability of Statistics.var with dims. (#337) (@maleadt)
- Fix parameter alignment for dynamic parallelism. (#338) (@maleadt)
- Micro-optimize Base.fill. (#339) (@maleadt)
v1.2.0
CUDA v1.2.0
Closed issues:
- Segmentation fault when creating CuArray of CuArray (#133)
- CUDNN tests fail with CUDNN 6.0.20 (#134)
- CURAND fail to initialize, code 203 (#255)
- Deprecation warnings (#277)
- Can we pleeeeeeeease make cu(x) eltype preserving? (#278)
- On the use of @sync during benchmarking in the documentation (#279)
- Example in Multiple GPUs doc fails (#282)
- LLVM error: Cannot cast between two non-generic address spaces (#286)
Merged pull requests:
- Host-side CUTENSOR (#243) (@kshyatt)
- Add and document a non-blocking version of at-sync. (#280) (@maleadt)
- Use a custom adaptor for cu so that adapt(CuArray) preserves element types. (#281) (@maleadt)
- Check and warn for library versions. (#284) (@maleadt)
- Add note about nvml dll missing (#288) (@kshyatt)
- Update your PR to have tests pass (#289) (@kshyatt)
- Update manifest (#290) (@github-actions[bot])
- Support CUDA 11 (#291) (@maleadt)
- do not open the file twice when reading the libdevice bitcode (#294) (@jakebolewski)
v1.1.0
CUDA v1.1.0
Closed issues:
- Fix NSight detection (#29)
- versioninfo() (#34)
- throw_... messages: invalid call to
jl_alloc_string
(#54) - INTERNAL_ERROR during CUDNN handle creation (#183)
- Improve benchmarking suite (#222)
- How to load CUDA.jl conditional on the computer having a CUDA-compatible GPU? (#237)
- CUSOLVER.heevd! returning Float and not Complex (#238)
- Broadcasting fails with Float64 -> Int conversion (#240)
- Running
] test CUDA
withOhMyREPL
instartup.jl
causes some tests to fail (#246) - ERROR: Your LLVM does not support the NVPTX back-end. in local project environment (#249)
- CUDAnative: UndefVarError: AddrSpacePtr not defined on julia master (#250)
- Error while freeing CUDA.CuPtr (#254)
- Non-artifact initialization of CUDA.jl using CUDA 11 fails on Windows (#262)
- Library handle creation close to OOM fails with ERROR_NOT_INITIALIZED (#264)
- has(::TargetIterator, name::String) deprecation warning (#271)
Merged pull requests:
- Add texture support from CuTextures.jl (#209) (@maleadt)
- Memory pinning with interval trees (#233) (@maleadt)
- Better nsys detection. (#234) (@maleadt)
- CompatHelper: add new compat entry for "IntervalTrees" at version "1.0" (#235) (@github-actions[bot])
- Update manifest (#239) (@github-actions[bot])
- Replace slash by path separator to properly skip tests on Windows. (#241) (@maleadt)
- Retry cudnnCreate on CUDNN_STATUS_INTERNAL_ERROR and CUDNN_STATUS_NOT_INITIALIZED (#244) (@maleadt)
- Add issue templates (#245) (@maleadt)
- Import wrapper tooling, wrap NVML (#248) (@maleadt)
- Ignore some potentially unsupported NVML features. (#251) (@maleadt)
- Assert NVPTX availability by just calling the initializer. (#252) (@maleadt)
- Update manifest (#257) (@github-actions[bot])
- Adapt to AddrSpacePtr rename. (#258) (@maleadt)
- Typo in installation overview docs (#260) (@clintonTE)
- Update GPUCompiler.jl (#266) (@maleadt)
- Retry library initialization failure due to (badly reported) OOM. (#268) (@maleadt)
- Upgrade CUTENSOR to v1.1.0. (#269) (@maleadt)
- Use CUDNN from Yggdrasil. (#272) (@maleadt)
- Update manifest (#273) (@github-actions[bot])
- Improve local CUDA discovery for CUDA 11 (#274) (@maleadt)
- Compatibility with latest LLVM and GPUCompiler (#275) (@maleadt)
v1.0.2
CUDA v1.0.2
Closed issues:
- Dynamic generation of docs including benchmarking timings can make the numbers "weird" (#11)
Merged pull requests:
v1.0.1
CUDA v1.0.1
v1.0.0
CUDA v1.0.0
Closed issues:
- unsafe_copy3d!: srcPos and dstPos handling (#27)
- Test failure on Windows (#37)
- Texture memory? (#46)
- Tests for the LLVM passes (#52)
- Bugged Sparse Matrix-Dense matrix multiplication, where dense matrix is transposed (#77)
- Stack overflow when broadcasting over empty view in CuArrays 2.x (#82)
- Sparse CSC gemm wrappers actually call CSR routines (#181)
- Testsuite calls startup.jl (#182)
- LLVM error: Cannot cast between two non-generic address spaces (#190)
- Error running CUDA in Jupyter (#195)
- Floating-point Inf causes an error (#205)
- mul! issue (#213)
Merged pull requests:
- include potri and test (#179) (@erathorn)
- Fix sparse-dense matmul, with transposed dense (#180) (@irhum)
- Behave like Base wrt. test flags. (#184) (@maleadt)
- fix sparse CSC gemm and test (closes #181) (#185) (@jebej)
- Add speciatization functions f(ctx, x) = f(x) for GPUArrays randn! function (#186) (@Ellipse0934)
- Add inplace test for rand (#187) (@kshyatt)
- Fix cushow tests on Windows. (#188) (@maleadt)
- More tests for CUSPARSE (#189) (@kshyatt)
- fixed gels_batched! issue (#191) (@clintonTE)
- Added wrappers for cusolverDnpotrfBatched (#192) (@IvanYashchuk)
- Added wrappers for cusolverDnpotrsBatched (#193) (@IvanYashchuk)
- Compatibility with Julia 1.5 (#194) (@maleadt)
- Add gemmex wrapper and test (#196) (@kshyatt)
- Fix handling of srcPos and dstPos in unsafe_copy3d! (#197) (@samo-lin)
- Prefer a local CUDA installation when running on CI, reinstate Julia 1.3. (#198) (@maleadt)
- Add support for mixed precision (#200) (@kshyatt)
- Add texture support from CuTextures.jl (#206) (@maleadt)
- Error throwing tests (#207) (@kshyatt)
- Update manifest (#208) (@github-actions[bot])
- A few more tests for cusparse (#210) (@kshyatt)
- Specialize Base.mightalias for better broadcast performance. (#211) (@maleadt)
- Fix some mul! ambiguities, and dispatch more to CUBLAS. (#214) (@maleadt)
- CUDA 11 compatibility entries (#221) (@maleadt)
- Benchmark suite: tune and cache params (#223) (@maleadt)
- Add benchmarks (#224) (@maleadt)
- Update manifest (#225) (@github-actions[bot])
v0.1.0
CUDA v0.1.0
Closed issues:
- Documentation: installation instructions (#1)
- Faced some errors while testing cuda in Julia (#3)
- facing unknown errors while compiling exact similar code for parallelization on CPU (#7)
Merged pull requests:
- Fix typo (#2) (@innerlee)
- Doc: fix comment on how memory is moving (#4) (@mbauman)
- Install TagBot as a GitHub Action (#8) (@JuliaTagBot)
- small grammar/typo tweaks to the introduction tutorial (#12) (@KristofferC)
- Add code of other CUDA packages (#14) (@maleadt)
- Initialize the memory pool (#19) (@maleadt)
- Improve initialization for threading (#20) (@maleadt)
- Don't use BinaryBuilder for most CI tests. (#21) (@maleadt)