Shape inference, jitted routines, gccjit backend
Major rewrite. Abandoning the design choices of 0.1 and 0.2.
Added:
- Optionally, inferring or checking tensor (batch) sizes from data (e.g. file) sizes.
- Static indexing. A "slice" operator to select individual batches.
- Established the backends API with first-class modules.
- The
Train
module as an optimization "frontend". - Parallel optimization across devices.
- Global settings configurable via config files, environment variables, and commandline flags.
- Integration of backend logging with ppx_minidebug (the
debug_log_from_routines
setting).
Changed:
- The Cuda backend is not supported for now. It is (optionally) buildable to reduce code rot.
- Dynamic indexing is not supported anymore (to reduce complexity). It might be reintroduced if needed.
- Factored out the arrayjit library / package containing compilation (former
Ndarray
,Node
,Code
). - Renamed
Formula
->Tensor
- No more "form vs. non-form" formulas / tensors.
- Formula/tensor roots are split into forward roots and backprop roots.
- No more
%nn_rs
,%nn_dt
syntaxes and Synthetic fetch primitive. - Renamed
%nn_op
to%op
and%nn_cd
to%cd
. - Migrated gccjit into a separate repository.
- Migrated cudajit into a separate repository.
- Massive rewrite of shape inference in a declarative style.
- Generalize
zero_out
toinitialize_neutral
to prepare arbitrary accumulation operation. - Renamed
Node
->Lazy_array
->Tnode
(tensor node).
And more.