NOTE: additional documentation here
This repository provides a wasmCloud capability provider and actors to perform inference using machine learning models for ONNX and Tensorflow.
In order to run and deploy the ML Demo on Cosmonic, follow the Cosmonic Getting Started Guide and use the cosmo
CLI to deploy the application.
cosmo up
cosmo app deploy ./wadm.yaml
The deployed application should look like the following:
Apart from the underlying inference engine, e.g. ONNX vs. Tensorflow, the pre-configured models differ in a further aspect: concerning the trivial models, one may request processing upon arbitrary shapes of one-dimensional data, [1, n]
. Mobilenet and Squeezenet, however, have more requirements regarding their respective input tensor. To fulfill these, the respective input tensor of an arbitrary image can be preprocessed before being routed to the inference engine.
The application provides three endpoints. The first endpoint routes the input tensor to the related inference engine without any pre-processing. The second endpoint pre-processes the input tensor and routes it to the related inference engine thereafter. The third performs a pre-processing before the prediction step and a post-processinging afterwards.
0.0.0.0:<port>/<model>
, e.g.0.0.0.0:7078/identity
0.0.0.0:<port>/<model>/preprocess
, e.g.0.0.0.0:7078/squeezenetv117/preprocess
0.0.0.0:<port>/<model>/matches
, e.g.0.0.0.0:7078/squeezenetv117/matches
To trigger a request against the identity model, type the following:
curl -v POST 0.0.0.0:8078/identity -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'
The response should comprise HTTP/1.1 200 OK
as well as {"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}}
The following happens:
- The http POST sends a request for a model with name "challenger", index
0
and somedata
. data
is vector[1.0f32, 2.0, 3.0, 4.0]
converted to a vector of bytes.- A response is computed. The result is sent back.
- The
data
in the request equalsdata
in the response because the pre-loaded model "challenger" is one that yields as output what it got as input.
To trigger a request against the plus3 model, type the following:
curl -v POST 0.0.0.0:8078/plus3 -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'
The response is
{"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64]}}
Note that in contrast to the identity model, the answer from plus3 is not at all identical to the request. Converting the vector of bytes [0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64]
back to a vector of f32
yields [4.0, 5.0, 6.0, 7.0]
. This was expected: each element from the input is incremented by three [1.0, 2.0, 3.0, 4.0]
→ [4.0, 5.0, 6.0, 7.0]
, hence the name of the model: plus3.
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
Note that the output tensor is of shape [1,1000]
and needs to be post-processed by an evaluation of the softmax over the outputs. In case the softmax shall be evaluated as well use the third endpoint, for example like the following:
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
Note that the output tensor is of shape [1,1000]
and needs to be post-processed where the post-processing is currently not part of the application. Or, including pos-processing as follows:
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
The answer should comprise
[{"label":"n02883205 bow tie, bow-tie, bowtie","probability":0.16806115},{"label":"n04350905 suit, suit of clothes","probability":0.14194612},{"label":"n03763968 military uniform","probability":0.11412828},{"label":"n02669723 academic gown, academic robe, judge's robe","probability":0.09906072},{"label":"n03787032 mortarboard","probability":0.09620707}]
The capability provider assumes a bindle to comprise two parcels where each parcel is assigned one of the following two groups:
- model
- metadata
The first, model
, is assumed to comprise model data, e.g. an ONNX model. The second, metadata
, is currently assumed to be json containing the metadata of the model. In case you create new bindles, make sure to assign these two groups.
The capability provider uses the amazing inference toolkit tract and currently supports the following inference engines
Concerning ONNX, see tract's documentation for a detailed discussion of ONNX format coverage.
Concerning Tensorflow, only TensorFlow 1.x is supported, not Tensorflow 2. However, models of format Tensorflow 2 may be converted to Tensorflow 1.x. For a more detailled discussion, see the following resources:
https://www.tensorflow.org/guide/migrate/tf1_vs_tf2
https://stackoverflow.com/questions/59112527/primer-on-tensorflow-and-keras-the-past-tf1-the-present-tf2#:~:text=In%20terms%20of%20the%20behavior,full%20list%20of%20data%20types.
Currently, there is no support of any accelerators like GPUs or TPUs. On the one hand, there is a range of coral devices like the Dev board supporting Tensorflow for TPU based inference. However, they only support the Tensorflow Lite derivative. For more information see Coral's Edge TPU inferencing overview.