Release Shape inference, jitted routines, gccjit backend · ahrefs/ocannl

Major rewrite. Abandoning the design choices of 0.1 and 0.2.

Added:

Optionally, inferring or checking tensor (batch) sizes from data (e.g. file) sizes.
Static indexing. A "slice" operator to select individual batches.
Established the backends API with first-class modules.
The Train module as an optimization "frontend".
Parallel optimization across devices.
Global settings configurable via config files, environment variables, and commandline flags.
Integration of backend logging with ppx_minidebug (the debug_log_from_routines setting).

Changed:

The Cuda backend is not supported for now. It is (optionally) buildable to reduce code rot.
Dynamic indexing is not supported anymore (to reduce complexity). It might be reintroduced if needed.
Factored out the arrayjit library / package containing compilation (former Ndarray, Node, Code).
Renamed Formula -> Tensor
No more "form vs. non-form" formulas / tensors.
Formula/tensor roots are split into forward roots and backprop roots.
No more %nn_rs, %nn_dt syntaxes and Synthetic fetch primitive.
Renamed %nn_op to %op and %nn_cd to %cd.
Migrated gccjit into a separate repository.
Migrated cudajit into a separate repository.
Massive rewrite of shape inference in a declarative style.
Generalize zero_out to initialize_neutral to prepare arbitrary accumulation operation.
Renamed Node -> Lazy_array -> Tnode (tensor node).

And more.

Provide feedback