Skip to content

Latest commit

 

History

History
273 lines (273 loc) · 13.9 KB

ml.md

File metadata and controls

273 lines (273 loc) · 13.9 KB

wasi-nn is a WASI API for performing machine learning (ML) inference. The API is not (yet) capable of performing ML training. WebAssembly programs that want to use a host's ML capabilities can access these capabilities through wasi-nn's core abstractions: graphs and tensors. A user loads an ML model -- instantiated as a graph -- to use in an ML backend. Then, the user passes tensor inputs to the graph, computes the inference, and retrieves the tensor outputs.

This example world shows how to use these primitives together.

Import interface wasi:nn/[email protected]

All inputs and outputs to an ML inference are represented as tensors.


Types

type tensor-dimensions

tensor-dimensions

The dimensions of a tensor.

The array length matches the tensor rank and each element in the array describes the size of each dimension

enum tensor-type

The type of the elements in a tensor.

Enum Cases
  • FP16
  • FP32
  • FP64
  • BF16
  • U8
  • I32
  • I64

type tensor-data

tensor-data

The tensor data.

Initially conceived as a sparse representation, each empty cell would be filled with zeros and the array length must match the product of all of the dimensions and the number of bytes in the type (e.g., a 2x2 tensor with 4-byte f32 elements would have a data array of length 16). Naturally, this representation requires some knowledge of how to lay out data in memory--e.g., using row-major ordering--and could perhaps be improved.

resource tensor


Functions

[constructor]tensor: func

Params
Return values

[method]tensor.dimensions: func

Describe the size of the tensor (e.g., 2x2x2x2 -> [2, 2, 2, 2]). To represent a tensor containing a single value, use [1] for the tensor dimensions.

Params
Return values

[method]tensor.ty: func

Describe the type of element in the tensor (e.g., f32).

Params
Return values

[method]tensor.data: func

Return the tensor data.

Params
Return values

Import interface wasi:nn/[email protected]

TODO: create function-specific errors (#42)


Types

enum error-code

Enum Cases
  • invalid-argument

    Caller module passed an invalid argument.

  • invalid-encoding

    Invalid encoding.

  • timeout

    The operation timed out.

  • runtime-error

    Runtime Error.

  • unsupported-operation

    Unsupported operation.

  • too-large

    Graph is too large.

  • not-found

    Graph not found.

  • security

    The operation is insecure or has insufficient privilege to be performed. e.g., cannot access a hardware feature requested

  • unknown

    The operation failed for an unspecified reason.

resource error


Functions

[method]error.code: func

Return the error code.

Params
Return values

[method]error.data: func

Errors can propagated with backend specific status through a string value.

Params
Return values
  • string

Import interface wasi:nn/[email protected]

An inference "session" is encapsulated by a graph-execution-context. This structure binds a graph to input tensors before compute-ing an inference:


Types

type error

error

#### `type tensor` [`tensor`](#tensor)

#### `tuple named-tensor`

Identify a tensor by name; this is necessary to associate tensors to graph inputs and outputs.

Tuple Fields

resource graph-execution-context

Bind a graph to the input and output tensors for an inference.

TODO: this may no longer be necessary in WIT (#43)

Functions

[method]graph-execution-context.compute: func

Compute the inference on the given inputs.

Params
Return values

Import interface wasi:nn/[email protected]

A graph is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML framework (e.g., TensorFlow):


Types

type error

error

#### `type tensor` [`tensor`](#tensor)

#### `type graph-execution-context` [`graph-execution-context`](#graph_execution_context)

#### `resource graph`

An execution graph for performing inference (i.e., a model).

enum graph-encoding

Describes the encoding of the graph. This allows the API to be implemented by various backends that encode (i.e., serialize) their graph IR with different formats.

Enum Cases
  • openvino
  • onnx
  • tensorflow
  • pytorch
  • tensorflowlite
  • ggml
  • autodetect

enum execution-target

Define where the graph should be executed.

Enum Cases
  • cpu
  • gpu
  • tpu

type graph-builder

graph-builder

The graph initialization data.

This gets bundled up into an array of buffers because implementing backends may encode their graph IR in parts (e.g., OpenVINO stores its IR and weights separately).


Functions

[method]graph.init-execution-context: func

Params
Return values

load: func

Load a graph from an opaque sequence of bytes to use for inference.

Params
Return values

load-by-name: func

Load a graph by name.

How the host expects the names to be passed and how it stores the graphs for retrieval via this function is implementation-specific. This allows hosts to choose name schemes that range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.

Params
  • name: string
Return values