wasi-nn
is a WASI API for performing machine learning (ML) inference. The API is not (yet)
capable of performing ML training. WebAssembly programs that want to use a host's ML
capabilities can access these capabilities through wasi-nn
's core abstractions: graphs and
tensors. A user load
s an ML model -- instantiated as a graph -- to use in an ML backend.
Then, the user passes tensor inputs to the graph, computes the inference, and retrieves the
tensor outputs.
This example world shows how to use these primitives together.
- Imports:
- interface
wasi:nn/[email protected]
- interface
wasi:nn/[email protected]
- interface
wasi:nn/[email protected]
- interface
wasi:nn/[email protected]
- interface
Import interface wasi:nn/[email protected]
All inputs and outputs to an ML inference are represented as tensor
s.
The dimensions of a tensor.
The array length matches the tensor rank and each element in the array describes the size of each dimension
The type of the elements in a tensor.
The tensor data.
Initially conceived as a sparse representation, each empty cell would be filled with zeros and the array length must match the product of all of the dimensions and the number of bytes in the type (e.g., a 2x2 tensor with 4-byte f32 elements would have a data array of length 16). Naturally, this representation requires some knowledge of how to lay out data in memory--e.g., using row-major ordering--and could perhaps be improved.
dimensions
:tensor-dimensions
ty
:tensor-type
data
:tensor-data
- own<
tensor
>
Describe the size of the tensor (e.g., 2x2x2x2 -> [2, 2, 2, 2]). To represent a tensor
containing a single value, use [1]
for the tensor dimensions.
self
: borrow<tensor
>
Describe the type of element in the tensor (e.g., f32
).
self
: borrow<tensor
>
Return the tensor data.
self
: borrow<tensor
>
Import interface wasi:nn/[email protected]
TODO: create function-specific errors (#42)
-
Caller module passed an invalid argument.
-
Invalid encoding.
-
The operation timed out.
-
Runtime Error.
-
Unsupported operation.
-
Graph is too large.
-
Graph not found.
-
The operation is insecure or has insufficient privilege to be performed. e.g., cannot access a hardware feature requested
-
The operation failed for an unspecified reason.
Return the error code.
self
: borrow<error
>
Errors can propagated with backend specific status through a string value.
self
: borrow<error
>
Import interface wasi:nn/[email protected]
An inference "session" is encapsulated by a graph-execution-context
. This structure binds a
graph
to input tensors before compute
-ing an inference:
#### `type tensor` [`tensor`](#tensor)
Identify a tensor by name; this is necessary to associate tensors to graph inputs and outputs.
0
:string
1
: own<tensor
>
Bind a graph
to the input and output tensors for an inference.
TODO: this may no longer be necessary in WIT (#43)
Compute the inference on the given inputs.
self
: borrow<graph-execution-context
>inputs
: list<named-tensor
>
- result<list<
named-tensor
>, own<error
>>
Import interface wasi:nn/[email protected]
A graph
is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML
framework (e.g., TensorFlow):
#### `type tensor` [`tensor`](#tensor)
#### `type graph-execution-context` [`graph-execution-context`](#graph_execution_context)
An execution graph for performing inference (i.e., a model).
Describes the encoding of the graph. This allows the API to be implemented by various backends that encode (i.e., serialize) their graph IR with different formats.
Define where the graph should be executed.
The graph initialization data.
This gets bundled up into an array of buffers because implementing backends may encode their graph IR in parts (e.g., OpenVINO stores its IR and weights separately).
self
: borrow<graph
>
- result<own<
graph-execution-context
>, own<error
>>
Load a graph
from an opaque sequence of bytes to use for inference.
builder
: list<graph-builder
>encoding
:graph-encoding
target
:execution-target
Load a graph
by name.
How the host expects the names to be passed and how it stores the graphs for retrieval via this function is implementation-specific. This allows hosts to choose name schemes that range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.