Releases: lukstafi/ocaml-cudajit
Releases · lukstafi/ocaml-cudajit
0.6.1: bug fixes and reporting stream-related information
From the changelog:
Added
- Debug number of total and unreleased events in a stream.
- Debug the total number of non-garbage-collected streams across all devices.
- Verifies that compilation options fit in a set of characters: alphanumeric and a few interpunction.
Fixed
- Docu-comment typo.
- The flags
cu_event_wait_external
andcu_event_wait_default
were switched around forrecord ?external_
andwait ?external_
event functions. - Don't destroy released (destroyed) events in
Delimited_event.synchronize
.
0.6.1: bug fixes and reporting stream-related information
From the changelog:
Added
- Debug number of total and unreleased events in a stream.
- Debug the total number of non-garbage-collected streams across all devices.
- Verifies that compilation options fit in a set of characters: alphanumeric and a few interpunction.
Fixed
- Docu-comment typo.
- The flags
cu_event_wait_external
andcu_event_wait_default
were switched around forrecord ?external_
andwait ?external_
event functions. - Don't destroy released (destroyed) events in
Delimited_event.synchronize
.
0.6.0: detecting use-after-free for memory pointers
This release focuses on detecting use-after-free for memory pointers, and improving ease of debugging.
Moreover, added functions: Device.get_free_and_total_mem
, Stream.mem_alloc
, Stream.mem_free
.
0.5.0: modular interface, events, finalizers
In this release:
- we split the API into modules:
Nvrtc
,Device
,Context
,Deviceptr
,Module
,Stream
; - we support CUDA events via
Event
and a wrapper moduleDelimited_event
that manages destroying events; - we manage the primary context, created contexts, streams and events via finalizers.
0.4.1: support for half precision
In this release:
- We pass the
$CUDA_PATH/include
path to the nvrtc compiler; otherwise e.g.#include <cuda_fp16.h>
will not work. The user could already be doing this, but since we monitor the installation via conf-cuda, it's better to prepend the option automatically. - We work around
ctypes
not supporting theFloat16
type.
0.4.0
For details, see CHANGES.md. Highlights:
- Context flags.
- Asynchronous operations and streams.
- Bug fix: kernel launch params lifetimes.
- Interface file with documentation.
- Self-contained types in the interface.
- Updated to a newer CUDA version.
0.1.1.0
From the changelog:
[0.1.1] 2024-05-09
Added
- Continuous Integration on GitHub thanks to GitHub action Jimver/cuda-toolkit, but only PTX compilation.
Fixed
- Test target should erase compiler versions.
[0.1.0] 2023-10-28
Added
- Initial stand-alone release. For earlier changes, see e.g. ocannl/cudajit @ 2 months ago
Fixed
- To be defensive, pass
-I
and-L
arguments to the compiler and linker with the default paths on linux-like systems.