forked from webmachinelearning/webnn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.bs
1738 lines (1529 loc) · 79.4 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: Web Neural Network API
Shortname: webnn
Level: 1
Status: CG-DRAFT
Group: webml
URL: https://webmachinelearning.github.io/webnn/
Editor: Ningxin Hu 68202, Intel Corporation https://intel.com
Editor: Chai Chaoweeraprasit 120203, Microsoft Corporation https://microsoft.com
Abstract: This document describes a dedicated low-level API for neural network inference hardware acceleration.
Repository: https://github.com/webmachinelearning/webnn
!Explainer: <a href="https://github.com/webmachinelearning/webnn/blob/master/explainer.md">explainer.md</a>
!Polyfill: <a href="https://github.com/webmachinelearning/webnn-polyfill">webnn-polyfill</a> / <a href="https://github.com/webmachinelearning/webnn-samples">webnn-samples</a>
Markup Shorthands: markdown yes
Markup Shorthands: dfn yes
Markup Shorthands: idl yes
Markup Shorthands: css no
Logo: https://webmachinelearning.github.io/webmachinelearning-logo.png
</pre>
<pre class="anchors">
urlPrefix: https://www.khronos.org/registry/webgl/specs/latest/1.0/; spec: WEBGL-1
type: interface
text: WebGLRenderingContext; url: 5.14
text: WebGLBuffer; url: 5.4
text: WebGLTexture; url: 5.9
urlPrefix: https://gpuweb.github.io/gpuweb/; spec: WEBGPU
type: interface
text: GPUDevice; url: gpu-device
text: GPUBuffer; url: buffer-interface
text: GPUTexture; url: texture-interface
</pre>
Introduction {#intro}
=====================
We're working on this section. Meanwhile, please take a look at the <a href="https://github.com/webmachinelearning/webnn/blob/master/explainer.md">explainer</a>.
Use cases {#usecases}
=====================
## Application Use Cases ## {#usecases-application}
This section illustrates application-level use cases for neural network
inference hardware acceleration. All applications in those use cases can be
built on top of pre-trained deep neural network (DNN) [[models]].
### Person Detection ### {#usecase-person-detection}
A user opens a web-based video conferencing application, but she temporarily
leaves from her room. The application is watching whether she is in front of her
PC by using object detection (for example, using object detection approaches
such as [[SSD]] or [[YOLO]] that use a single DNN) to detect regions in a camera
input frame that include persons.
When she comes back, the application automatically detects her and notifies
other online users that she is active now.
### Semantic Segmentation ### {#usecase-segmentation}
A user joins a teleconference via a web-based video conferencing application at
her desk since no meeting room in her office is available. During the
teleconference, she does not wish that her room and people in the background are
visible. To protect the privacy of the other people and the surroundings, the
application runs a machine learning model such as [[DeepLabv3+]] or
[[MaskR-CNN]] to semantically split an image into segments and replaces
segments that represent other people and background with another picture.
### Skeleton Detection ### {#usecase-skeleton-detection}
A web-based video conferencing application tracks a pose of user's skeleton by
running a machine learning model, which allows for real-time human pose
estimation, such as [[PoseNet]] to recognize her gesture and body language. When
she raises her hand, her microphone is automatically unmuted and she can start
speaking on the teleconference.
### Face Recognition ### {#usecase-face-recognition}
There are multiple people in the conference room and they join an online meeting
using a web-based video conferencing application. The application detects faces
of participants by using object detection (for example, using object detection
approaches such as [[SSD]]) and checks whether each face was present at the
previous meeting or not by running a machine learning model such as [[FaceNet]],
which verifies whether two faces would be identical or not.
### Facial Landmark Detection ### {#usecase-facial-landmarks}
A user wants to find new glasses that beautifully fits her on an online glasses
store. The online store offers web-based try-on simulator that runs a machine
learning model such as Face Alignment Network [[FAN]] to detect facial landmarks
like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator
properly renders the selected glasses on the detected position of eyes on her
facial image.
### Style Transfer ### {#usecase-style-transfer}
A user is looking for cosmetics on an online store and wondering which color may
fit her face. The online store shows sample facial makeup images of cosmetics,
and offers makeup simulator that runs a machine learning model like
[[ContextualLoss]] or [[PairedCycleGAN]] to transfer the makeup style of the
sample makeup image to her facial image. She can check how the selected makeup
looks like on her face by the simulator.
### Super Resolution ### {#usecase-super-resolution}
A web-based video conferencing is receiving a video stream from its peer, but
the resolution of the video becomes lower due to network congestion. To prevent
degradation of the perceived video quality, the application runs a machine
learning model for super-resolution such as [[SRGAN]] to generate
higher-resolution video frames.
### Image Captioning ### {#usecase-image-captioning}
For better accessibility, a web-based presentation application provides
automatic image captioning by running a machine learning model such as
[[im2txt]] which predicts explanatory words of the presentation slides.
### Machine Translation ### {#usecase-translation}
Multiple people from various countries are talking via a web-based real-time
text chat application. The application translates their conversation by using a
machine learning model such as [[GNMT]] or [[OpenNMT]], which translates every
text into different language.
### Emotion Analysis ### {#usecase-emotion-analysis}
A user is talking to her friend via a web-based real-time text chat application,
and she is wondering how the friend feels because she cannot see the friend's
face. The application analyses the friend's emotion by using a machine learning
model such as [[DeepMoji]], which infers emotion from input texts, and displays
an emoji that represents the estimated emotion.
### Video Summarization ### {#usecase-video-summalization}
A web-based video conferencing application records received video streams, and
it needs to reduce recorded video data to be stored. The application generates
the short version of the recorded video by using a machine learning model for
video summarization such as [[Video-Summarization-with-LSTM]].
### Noise Suppression ### {#usecase-noise-suppression}
A web-based video conferencing application records received audio streams, but
usually the background noise is everywhere. The application leverages real-time
noise suppression using Recurrent Neural Network such as [[RNNoise]] for
suppressing background dynamic noise like baby cry or dog barking to improve
audio experiences in video conferences.
## Framework Use Cases ## {#usecases-framework}
This section collects framework-level use cases for a dedicated low-level API
for neural network inference hardware acceleration. It is expected that Machine
Learning frameworks will be key consumers of the Web Neural Network API (WebNN
API) and the low-level details exposed through the WebNN API are abstracted out
from typical web developers. However, it is also expected that web developers
with specific interest and competence in Machine Learning will want to interface
with the WebNN API directly instead of a higher-level ML framework.
### Custom Layer ### {#usecase-custom-layer}
A web application developer wants to run a DNN model on the WebNN API. However,
she has found that some of activation functions like [[LeakyReLU]], [[ELU]],
etc. are not included in the WebNN API. To address this issue, she constructs
custom layers of the additional activation functions on top of the WebNN API.
Note that the scope of custom layers may include convolution, normalization,
etc. as well as activation.
### Network Concatenation ### {#usecase-network-concat}
A web application uses a DNN model, and its model data of upper convolutional
layers and lower fully-connected layers are stored in separate files, since
model data of the fully-connected layers are periodically updated due to fine
tuning at the server side.
Therefore, the application downloads both partial model files at first and
concatenates them into a single model. When the model is updated, the
application downloads fine-tuned part of the model and replace only the
fully-connected layers with it.
### Performance Adaptation ### {#usecase-perf-adapt}
A web application developer has a concern about performance of her DNN model on
mobile devices. She has confirmed that it may run too slow on mobile devices
which do not have GPU acceleration. To address this issue, her web application
refers to the WebNN API to confirm whether acceleration is available or not, so
that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on
CPU. In order to accommodate CPU execution, she modifies the application
so that the application loads the tiny model in the case of CPU-only devices.
API {#api}
=====================
## Navigator ## {#api-navigator}
<script type=idl>
partial interface Navigator {
[SecureContext] readonly attribute ML ml;
};
</script>
## ML ## {#api-ml}
<script type=idl>
enum MLPowerPreference {
// Let the user agent decide the most suitable behavior
"default",
// Prioritizes execution speed over power consumption
"high-performance",
// Prioritizes power consumption over other considerations such as execution speed
"low-power"
};
dictionary MLContextOptions {
// Preference as related to power consumption
MLPowerPreference powerPreference = "default";
};
[SecureContext, Exposed=Window]
interface ML {
// Create a context with options
MLContext createContext(optional MLContextOptions options = {});
// Create a context from WebGL rendering context
MLContext createContext(WebGLRenderingContext glContext);
// Create a context from WebGPU device
MLContext createContext(GPUDevice gpuDevice);
};
</script>
## MLContext ## {#api-mlcontext}
The {{MLContext}} interface represents a global state of neural network compute workload and execution processes.
<script type=idl>
[SecureContext, Exposed=Window]
interface MLContext {};
</script>
## MLOperandDescriptor ## {#api-mloperanddescriptor}
<script type=idl>
enum MLInputOperandLayout {
"nchw",
"nhwc"
};
enum MLOperandType {
"float32",
"float16",
"int32",
"uint32",
"int8",
"uint8"
};
dictionary MLOperandDescriptor {
// The operand type.
required MLOperandType type;
// The dimensions field is only required for tensor operands.
// The negative value means an unknown dimension.
sequence<long> dimensions;
};
</script>
## MLOperand ## {#api-mloperand}
<script type=idl>
[SecureContext, Exposed=Window]
interface MLOperand {};
</script>
## MLGraphBuilder ## {#api-mlgraphbuilder}
The {{MLGraphBuilder}} interface defines a set of operations as identified by the [[#usecases]] that can be composed into a computational graph. It also represents the intermediate state of a graph building session.
<script type=idl>
typedef record<DOMString, MLOperand> MLNamedOperands;
dictionary MLBufferResourceView {
required (WebGLBuffer or GPUBuffer) resource;
unsigned long long offset = 0;
unsigned long long size;
};
typedef (ArrayBufferView or MLBufferResourceView) MLBufferView;
[SecureContext, Exposed=Window]
interface MLGraphBuilder {
// Construct the graph builder from the context.
constructor(MLContext context);
// Create an operand for a graph input.
MLOperand input(DOMString name, MLOperandDescriptor desc);
// Create an operand for a graph constant.
MLOperand constant(MLOperandDescriptor desc, MLBufferView bufferView);
// Create a single-value operand from the specified number of the specified type.
MLOperand constant(double value, optional MLOperandType type = "float32");
// Compile the graph up to the specified output operands
Promise<MLGraph> build(MLNamedOperands outputs);
};
</script>
### batchNormalization ### {#api-mlgraphbuilder-batchnorm}
Normalize the tensor values of input features across the batch dimension using [[Batch-Normalization]]. For each input feature, the mean and variance values of that feature supplied in this calculation as parameters are previously computed across the batch dimension of the input during the model training phrase of this operation.
<script type=idl>
dictionary MLBatchNormalizationOptions {
MLOperand scale;
MLOperand bias;
long axis = 1;
float epsilon = 1e-5;
};
partial interface MLGraphBuilder {
MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
optional MLBatchNormalizationOptions options = {});
};
</script>
<div algorithm=batchnorm>
**Arguments:**
- *input*: an {{MLOperand}}. The input N-D tensor.
- *mean*: an {{MLOperand}}. The 1-D tensor of the mean values of the input features across the batch whose length is equal to the size of the input dimension denoted by *options.axis*.
- *variance*: an {{MLOperand}}. The 1-D tensor of the variance values of the input features across the batch whose length is equal to the size of the input dimension denoted by *options.axis*.
- *options*: an optional {{MLBatchNormalizationOptions}}. The optional parameters of the operation.
- *scale*: an {{MLOperand}}. The 1-D tensor of the scaling values whose length is equal to the size of the input dimension denoted by *options.axis*.
- *bias*: an {{MLOperand}}. The 1-D tensor of the bias values whose length is equal to the size of the input dimension denoted by *options.axis*.
- *axis*: a {{long}} scalar. The index to the feature count dimension of the input shape for which the mean and variance values are. When it's not specified, the default value is 1.
- *epsilon*: a {{float}} scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified.
**Returns:** an {{MLOperand}}. The batch-normalized N-D tensor of the same shape as the input tensor.
When *input* is a 4-D tensor of the *"nchw"* or *"nhwc"* layout, *options.axis* should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.
<div class="note">
The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from
the usage of other operations as follow. However, user agents typically have a more efficient implementation for it,
therefore its usage is encouraged from the performance standpoint.
<pre highlight="js">
const shape = [1,-1,1,1];
return builder.add(
builder.mul(
builder.reshape(options.scale, shape),
builder.div(
builder.sub(input, builder.reshape(mean, shape)),
builder.pow(
builder.add(builder.reshape(variance, shape), builder.constant(options.epsilon)),
builder.constant(0.5))
)
),
builder.reshape(options.bias, shape)
);
</pre>
</div>
</div>
### clamp ### {#api-mlgraphbuilder-clamp}
Clamp the input tensor element-wise within a range specified by the minimum and maximum values.
<script type=idl>
dictionary MLClampOptions {
MLOperand minValue;
MLOperand maxValue;
};
partial interface MLGraphBuilder {
MLOperand clamp(MLOperand x, optional MLClampOptions options = {});
};
</script>
<div algorithm=clamp>
**Arguments:**
- *x*: an {{MLOperand}}. The input tensor.
- *options*: an optional {{MLClampOptions}}. The optional parameters of the operation.
- *minValue*: an {{MLOperand}}. Specifies the minimum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of *x* according to [[!numpy-broadcasting-rule]]. When it is not specified, the clamping is not performed on the lower limit of the range.
- *maxValue*: an {{MLOperand}}. Specifies the maximum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of *x* according to [[!numpy-broadcasting-rule]]. When it is not specified, the clamping is not performed on the upper limit of the range.
**Returns:** an {{MLOperand}}. The output tensor of the same shape as *x*.
Clamp the input tensor element-wise within a range specified by *minValue* and *maxValue*. The calculation follows the expression min(max(x, minValue), maxValue). When *minValue* is not specified, the clamping is not performed on the lower limit. When *maxValue* is not specified, the clamping is not performed on the upper limit.
<div class="note">
The behavior of this operation can be generically emulated from the usage of
other operations as follow. However, user agents typically have a more
efficient implementation for it, therefore its usage is encouraged from the
performance standpoint.
<pre highlight="js">
if (options.minValue === undefined) {
if (options.maxValue === undefined) {
return x;
} else {
return builder.min(x, options.maxValue);
}
} else {
if (options.maxValue === undefined) {
return builder.max(x, options.minValue);
} else {
return builder.min(builder.max(x, options.minValue), options.maxValue);
}
}
</pre>
</div>
</div>
### concat ### {#api-mlgraphbuilder-concat}
Concatenates the input tensors along a given axis.
<script type=idl>
partial interface MLGraphBuilder {
MLOperand concat(sequence<MLOperand> inputs, long axis);
};
</script>
<div algorithm=concat>
**Arguments:**
- *inputs*: a sequence of {{MLOperand}}. All input tensors must have the
same shape, except for the size of the dimension to concatenate on.
- *axis*: a {{long}} scalar. The axis that the inputs concatenate along, with
the value in the interval [0, N) where N is the rank of all the
inputs.
**Returns:** an {{MLOperand}}. The concatenated tensor of all the inputs along
the *axis*. The output tensor has the same shape except on the dimension
that all the inputs concatenated along. The size of that dimension is
computed as the sum of all the input sizes of the same dimension.
</div>
### conv2d ### {#api-mlgraphbuilder-conv2d}
Compute a 2-D convolution given 4-D input and filter tensors
<script type=idl>
enum MLFilterOperandLayout {
"oihw",
"hwio",
"ohwi"
};
enum MLAutoPad {
"explicit",
"same-upper",
"same-lower"
};
dictionary MLConv2dOptions {
sequence<long> padding;
sequence<long> strides;
sequence<long> dilations;
sequence<long> outputPadding;
sequence<long> outputSizes;
MLAutoPad autoPad = "explicit";
boolean transpose = false;
long groups = 1;
MLInputOperandLayout inputLayout = "nchw";
MLFilterOperandLayout filterLayout = "oihw";
};
partial interface MLGraphBuilder {
MLOperand conv2d(MLOperand input, MLOperand filter, optional MLConv2dOptions options = {});
};
</script>
<div algorithm=conv2d>
**Arguments:**
- *input*: an {{MLOperand}}. The input 4-D tensor. The logical shape
is interpreted according to the value of *options.layout*.
- *filter*: an {{MLOperand}}. The filter 4-D tensor. The logical shape is
interpreted according to the value of *options.layout* and *options.groups*.
- *options*: an optional {{MLConv2dOptions}}. The optional parameters of the operation.
- *padding*: a sequence of {{long}} of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of *input*, [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0].
- *strides*: a sequence of {{long}} of length 2. The stride of the sliding window for each spatial dimension of *input*, [stride_height, stride_width]. If not present, the values are assumed to be [1,1].
- *dilations*: a sequence of {{long}} of length 2. The dilation factor for each spatial dimension of *input*, [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1].
- *outputPadding*: a sequence of {{long}} of length 2. The padding values applied to each spatial dimension of the output tensor when *options.transpose* is set to true. This explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the *options.strides* is greater than 1. Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor. If not specified, the values are assumed to be [0,0].
- *outputSizes*: a sequence of {{long}} of length 2. The sizes of the last two dimensions of the output tensor when *options.transpose* is set to true. When the output sizes are explicitly specified, the output padding values in *options.outputPadding* are ignored. If not specified, the output sizes are automatically computed.
- *autoPad*: an {{MLAutoPad}}. The automatic input padding options. By default, this argument is set to *"explicit"*, which means that the values in the *options.padding* array should be used for input padding. When the option is set other than *"explicit"*, the values in the *options.padding* array are ignored. With the *"same-upper"* option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The *"same-lower"* option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one.
- *transpose*: a {{boolean}} indicating that a transposed convolution operation is performed. Transposed convolution is used in upsampling networks to increase the resolution of a feature as opposed to the typical convolution process that reduces the feature's resolution. When transposed convolution is performed, *options.outputPadding* may be needed to disambiguate the output tensor shape. If not present, this option is assumed to be false.
- *groups*: a {{long}} scalar. The number of groups that input channels and output channels are divided into, default to 1.
- *inputLayout*: an {{MLInputOperandLayout}}. The default value is *"nchw"*. This option specifies the layout format of the input and output tensor as follow:
"nchw":
- input tensor: [batches, input_channels, height, width]
- output tensor: [batches, output_channels, height, width]
"nhwc":
- input tensor: [batches, height, width, input_channels]
- output tensor: [batches, height, width, output_channels]
- *filterLayout*: a {{MLFilterOperandLayout}}. The default value is *"oihw"*. This option specifies the layout format of the filter tensor as follow:
"oihw":
- [output_channels, input_channels/groups, height, width]
"hwio":
- [height, width, input_channels/groups, output_channels]
"ohwi":
- [output_channels, height, width, input_channels/groups]
**Returns:** an {{MLOperand}}. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to the *options.layout* value. More specifically the sizes of the last two dimensions of the output tensor, the spatial dimensions, for the convolution operation can be calculated as follow:
*output size = 1 + (input size - filter size + beginning padding + ending padding) / stride*
Whereas for the transposed convolution case with *options.transpose* set to *true*, unless the *options.outputSizes* values are explicitly specified, the *options.outputPadding* may be needed to compute the spatial dimension values of the output tensor as follow:
*output size = (input size - 1) ** *stride + filter size - beginning padding - ending padding + output padding*
<div class="note">
A *depthwise* conv2d operation is a variant of grouped convolution, used in models like the MobileNet, where the *options.groups* = input_channels = output_channels and the shape of filter tensor is [options.groups, 1, height, width]
for *"nchw"* layout or [height, width, 1, options.groups] for "nhwc" layout.
</div>
</div>
### element-wise binary operations ### {#api-mlgraphbuilder-binary}
Compute the element-wise binary addition, subtraction, multiplication, division,
maximum and minimum of the two input tensors.
<script type=idl>
partial interface MLGraphBuilder {
MLOperand add(MLOperand a, MLOperand b);
MLOperand sub(MLOperand a, MLOperand b);
MLOperand mul(MLOperand a, MLOperand b);
MLOperand div(MLOperand a, MLOperand b);
MLOperand max(MLOperand a, MLOperand b);
MLOperand min(MLOperand a, MLOperand b);
MLOperand pow(MLOperand a, MLOperand b);
};
</script>
<div algorithm=binary>
**Arguments:**
- *a*: an {{MLOperand}}. The first input tensor.
- *b*: an {{MLOperand}}. The second input tensor.
**Returns:** an {{MLOperand}}. The output tensor that contains the result of
element-wise binary operation of the two input tensors.
The element-wise binary operation will be broadcasted according to
[[!numpy-broadcasting-rule]]. The rank of the output tensor is the maximum
rank of the input tensors. For each dimension of the output tensor, its size
is the maximum size along that dimension of the input tensors.
**Operation types:**
- *add*: Add the values of the two input tensors, element-wise.
- *sub*: Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
- *mul*: Multiply the values of the two input tensors, element-wise.
- *div*: Divide the values of the first input tensor with the values of the second tensor, element-wise.
- *max*: Select the greater values of the two input tensors, element-wise.
- *min*: Select the lesser values of the two input tensors, element-wise.
- *pow*: Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
</div>
### element-wise unary operations ### {#api-mlgraphbuilder-unary}
Compute the element-wise unary operation for input tensor.
<script type=idl>
partial interface MLGraphBuilder {
MLOperand abs(MLOperand x);
MLOperand ceil(MLOperand x);
MLOperand cos(MLOperand x);
MLOperand exp(MLOperand x);
MLOperand floor(MLOperand x);
MLOperand log(MLOperand x);
MLOperand neg(MLOperand x);
MLOperand relu(MLOperand x);
MLOperand sigmoid(MLOperand x);
MLOperand sin(MLOperand x);
MLOperand tan(MLOperand x);
MLOperand tanh(MLOperand x);
};
</script>
<div algorithm=unary>
**Arguments:**
- *x*: an {{MLOperand}}. The input tensor.
**Returns:** an {{MLOperand}}. The output tensor that contains the result of
element-wise unary operation of the input tensor. The shape of the output
tensor is the same as the shape of input tensor.
**Operation types:**
- *abs*: Compute the absolute value of the input tensor, element-wise.
- *ceil*: Compute the ceiling of the input tensor, element-wise.
- *cos*: Compute the cosine of the input tensor, element-wise.
- *exp*: Compute the exponential of the input tensor, element-wise.
- *floor*: Compute the floor of the input tensor, element-wise.
- *log*: Compute the natural logarithm of the input tensor, element-wise.
- *neg*: Compute the numerical negative value of the input tensor, element-wise.
- *relu*: Compute the [rectified linear function](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) of the input tensor, element-wise.
<div class="note">
The behavior of this operation can be generically emulated from the usage of
other operations as follow. However, user agents typically have a more
efficient implementation for it, therefore its usage is encouraged from the
performance standpoint.
<pre highlight="js">
return builder.max(builder.constant(0), x);
</pre>
</div>
- *sigmoid*: Compute the sigmoid function of the input tensor, element-wise.
- *sin*: Compute the sine of the input tensor, element-wise.
- *tan*: Compute the tangent of the input tensor, element-wise.
- *tanh*: Compute the hyperbolic tangent of the input tensor, element-wise.
</div>
### gemm ### {#api-mlgraphbuilder-gemm}
Calculate the [general matrix multiplication of the Basic Linear Algebra Subprograms](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3). The calculation follows the expression `alpha * A * B + beta * C`, where `A`, `B`, and `C` are matrices, and `A` and `B` may optionally be transposed prior to the calculation.
<script type=idl>
dictionary MLGemmOptions {
MLOperand c;
float alpha = 1.0;
float beta = 1.0;
boolean aTranspose = false;
boolean bTranspose = false;
};
partial interface MLGraphBuilder {
MLOperand gemm(MLOperand a, MLOperand b, optional MLGemmOptions options = {});
};
</script>
<div algorithm=gemm>
**Arguments:**
- *a*: an {{MLOperand}}. The first input 2-D tensor.
- *b*: an {{MLOperand}}. The second input 2-D tensor.
- *options*: an optional {{MLGemmOptions}}. The optional parameters of the operation.
- *c*: an {{MLOperand}}. The third input 2-D tensor.
- *alpha*: a {{float}} scalar multiplier for the first input, default to 1.0.
- *beta*: a {{float}} scalar multiplier for the third input, default to 1.0.
- *aTranspose*: a {{boolean}} indicating if the first input should be transposed prior to calculating the output, default to false.
- *bTranspose*: a {{boolean}} indicating if the second input should be transposed prior to calculating the output, default to false.
**Returns:** an {{MLOperand}}. The output 2-D tensor that contains the calculated product of all the inputs.
<div class="note">
The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
<pre highlight="js">
if (options.aTranspose)
a = builder.transpose(a);
if (options.bTranspose)
b = builder.transpose(b);
let ab = builder.matmul(builder.mul(builder.constant(options.alpha), a), b);
return (c ? builder.add(ab, builder.mul(builder.constant(options.beta), c)) : ab);
</pre>
</div>
</div>
### gru ### {#api-mlgraphbuilder-gru}
Gated Recurrent Unit [[GRU]] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of the Network
<script type=idl>
enum MLRecurrentNetworkWeightLayout {
"zrn", // update-reset-new gate ordering
"rzn" // reset-update-new gate ordering
};
enum MLRecurrentNetworkActivation {
"relu",
"sigmoid",
"tanh"
};
enum MLRecurrentNetworkDirection {
"forward",
"backward",
"both"
};
dictionary MLGruOptions {
MLOperand bias;
MLOperand recurrentBias;
MLOperand initialHiddenState;
boolean resetAfter = true;
boolean returnSequence = false;
MLRecurrentNetworkDirection direction = "forward";
MLRecurrentNetworkWeightLayout layout = "zrn";
sequence<MLRecurrentNetworkActivation> activations;
};
partial interface MLGraphBuilder {
sequence<MLOperand> gru(MLOperand input, MLOperand weight, MLOperand recurrentWeight,
long steps, long hiddenSize, optional MLGruOptions options = {});
};
</script>
<div algorithm=gru>
**Arguments:**
- *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size].
- *weight*: an {{MLOperand}}. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument.
- *recurrentWeight*: an {{MLOperand}}. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument.
- *steps*: a {{long}} scalar. The number of time steps in the recurrent network. The value must be greater than 0.
- *hiddenSize*: a {{long}} scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state.
- *options*: an optional {{MLGruOptions}}. The optional parameters of the operation.
- *bias*: an {{MLOperand}}. The 2-D input bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the *options.layout* argument.
- *recurrentBias*: an {{MLOperand}}. The 2-D recurrent bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the *options.layout* argument.
- *initialHiddenState*: an {{MLOperand}}. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it's assumed to be a tensor filled with zero.
- *resetAfter*: a {{boolean}} indicating whether to apply the reset gate after or before matrix multiplication. Default to true.
- *returnSequence*: a {{boolean}} indicating whether to also return the entire sequence with every cell output from each time step in it in addition to the cell output of the last time step. Default to false.
- *direction*: a {{MLRecurrentNetworkDirection}}. The processing direction of the input sequence. When set to *"both"*, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions.
- *layout*: a {{MLRecurrentNetworkWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the *update (z)*, *reset (r)*, and *new (n)* gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is *"zrn"*.
- *activations*: a sequence of {{MLRecurrentNetworkActivation}}. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it's assumed to be the sigmoid (*"sigmoid"*) and the hyperbolic tangent (*"tanh"*) function respectively.
**Returns:** a sequence of {{MLOperand}}. The first element of the sequence is a 3-D tensor of shape [num_directions, batch_size, hidden_size], the cell output from the last time step of the network. Additionally, if *returnSequence* is set to true, the second element is the 4-D output tensor of shape [steps, num_directions, batch_size, hidden_size] containing every cell outputs from each time step in the temporal sequence.
<div class="note">
The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
<pre highlight="js">
const numDirections = (options.direction == "both" ? 2 : 1);
let hiddenState = options.initialHiddenState;
if (!hiddenState) {
const desc = { type: 'float32', dimensions: [numDirections, 1, hiddenSize] };
const totalSize = numDirections * hiddenSize;
hiddenState = builder.constant(desc, new Float32Array(totalSize).fill(0));
}
let sequence = null;
let cellWeight = [];
let cellRecurrentWeight = [];
let cellBias = [];
let cellRecurrentBias = [];
for (let slot = 0; slot < numDirections; ++slot) {
cellWeight.push(builder.squeeze(builder.slice(weight, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
cellRecurrentWeight.push(builder.squeeze(builder.slice(recurrentWeight, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
cellBias.push(options.bias ? (builder.squeeze(builder.slice(options.bias, [slot, 0], [1, -1]), { axes: [0] })) : null);
cellRecurrentBias.push(options.recurrentBias ?
(builder.squeeze(builder.slice(options.recurrentBias, [slot, 0], [1, -1]), { axes: [0] })) : null);
}
for (let step = 0; step < steps; ++step) {
let cellHidden = [];
let cellOutput = null;
for (let slot = 0; slot < numDirections; ++slot) {
cellHidden.push(builder.squeeze(builder.slice(hiddenState, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
}
for (let slot = 0; slot < numDirections; ++slot) {
let slice = (slot == 1 || options.direction == "backward" ? steps - step - 1 : step);
let cellInput = builder.squeeze(builder.slice(input, [slice, 0, 0], [1, -1, -1]), { axes: [0] });
let result = builder.reshape(
builder.gruCell(
cellInput, cellWeight[slot], cellRecurrentWeight[slot],
cellHidden[slot], hiddenSize, { bias: cellBias[slot],
recurrentBias: cellRecurrentBias[slot], resetAfter: options.resetAfter,
layout: options.layout, activations: options.activations }),
[1, -1, hiddenSize]);
cellOutput = (cellOutput ? builder.concat([cellOutput, result], 0) : result);
}
hiddenState = cellOutput;
if (options.returnSequence) {
cellOutput = builder.reshape(cellOutput, [1, numDirections, -1, hiddenSize]);
sequence = (sequence ? builder.concat([sequence, cellOutput], 0) : cellOutput);
}
}
return (sequence ? [hiddenState, sequence] : [hiddenState]);
</pre>
</div>
</div>
### gruCell ### {#api-mlgraphbuilder-grucell}
A single time step of the Gated Recurrent Unit [[GRU]] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.
<script type=idl>
dictionary MLGruCellOptions {
MLOperand bias;
MLOperand recurrentBias;
boolean resetAfter = true;
MLRecurrentNetworkWeightLayout layout = "zrn";
sequence<MLRecurrentNetworkActivation> activations;
};
partial interface MLGraphBuilder {
MLOperand gruCell(MLOperand input, MLOperand weight, MLOperand recurrentWeight,
MLOperand hiddenState, long hiddenSize, optional MLGruCellOptions options = {});
};
</script>
<div algorithm=grucell>
**Arguments:**
- *input*: an {{MLOperand}}. The input 2-D tensor of shape [batch_size, input_size].
- *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument.
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument.
- *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size].
- *hiddenSize*: a {{long}} scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state.
- *options*: an optional {{MLGruCellOptions}}. The optional parameters of the operation.
- *bias*: an {{MLOperand}}. The 1-D input bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument.
- *recurrentBias*: an {{MLOperand}}. The 1-D recurrent bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument.
- *resetAfter*: a {{boolean}} indicating whether to apply the reset gate after or before matrix multiplication. Default to true.
- *layout*: a {{MLRecurrentNetworkWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the *update (z)*, *reset (r)*, and *new (n)* gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is *"zrn"*.
- *activations*: a sequence of {{MLRecurrentNetworkActivation}}. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it's default to the sigmoid (*"sigmoid"*) and the hyperbolic tangent (*"tanh"*) function respectively.
**Returns:** an {{MLOperand}}. The 2-D tensor of shape [batch_size, hidden_size], the cell output hidden state of a single time step of the recurrent network.
<div class="note">
The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
<pre highlight="js">
const one = builder.constant(1);
const zero = builder.constant(0);
// update gate
let z = builder.sigmoid(
builder.add(
builder.add(
(options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero),
(options.recurrentBias ? builder.slice(options.recurrentBias, [0], [hiddenSize]) : zero)
),
builder.add(
builder.matmul(
input,
builder.transpose(builder.slice(weight, [0, 0], [hiddenSize, -1]))
),
builder.matmul(
hiddenState,
builder.transpose(builder.slice(recurrentWeight, [0, 0], [hiddenSize, -1]))
)
)
)
);
// reset gate
let r = builder.sigmoid(
builder.add(
builder.add(
(options.bias ? builder.slice(options.bias, [hiddenSize], [hiddenSize]) : zero),
(options.recurrentBias ? builder.slice(options.recurrentBias, [hiddenSize], [hiddenSize]) : zero)
),
builder.add(
builder.matmul(
input,
builder.transpose(builder.slice(weight, [hiddenSize, 0], [hiddenSize, -1]))
),
builder.matmul(
hiddenState,
builder.transpose(builder.slice(recurrentWeight, [hiddenSize, 0], [hiddenSize, -1]))
)
)
)
);
// new gate
let n;
if (resetAfter) {
n = builder.tanh(
builder.add(
(options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero),
builder.add(
builder.matmul(
input,
builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1]))
),
builder.mul(
r,
builder.add(
(options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero),
builder.matmul(
hiddenState,
builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1]))
)
)
)
)
)
);
}
else {
n = builder.tanh(
builder.add(
builder.add(
(options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero),
(options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero)
),
builder.add(
builder.matmul(
input,
builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1]))
),
builder.matmul(
builder.mul(r, hiddenState),
builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1]))
)
)
)
);
}
// compute the new hidden state
return builder.add(builder.mul(z, hiddenState), builder.mul(n, builder.sub(one, z)));
</pre>
</div>
</div>
### instanceNormalization ### {#api-mlgraphbuilder-instancenorm}
Normalize the input features using [[Instance-Normalization]]. Unlike [[#api-mlgraphbuilder-batchnorm]] where the mean and variance values used in the calculation are previously computed across the batch dimension during the model training phrase, the mean and variance values used in the calculation of an instance normalization are computed internally on the fly per input feature.
<script type=idl>
dictionary MLInstanceNormalizationOptions {
MLOperand scale;
MLOperand bias;
float epsilon = 1e-5;
MLInputOperandLayout layout = "nchw";
};
partial interface MLGraphBuilder {
MLOperand instanceNormalization(MLOperand input,
optional MLInstanceNormalizationOptions options = {});
};
</script>
<div algorithm=instancenorm>
**Arguments:**
- *input*: an {{MLOperand}}. The input 4-D tensor.
- *options*: an optional {{MLInstanceNormalizationOptions}}. The optional parameters of the operation.
- *scale*: an {{MLOperand}}. The 1-D tensor of the scaling values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with *nchw* layout, the feature dimension is 1.
- *bias*: an {{MLOperand}}. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with *nchw* layout, the feature dimension is 1.
- *epsilon*: a {{float}} scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified.
- *layout*: an {{MLInputOperandLayout}}. This option specifies the layout format of the input. The default value is *"nchw"*.
**Returns:** an {{MLOperand}}. The instance-normalized 4-D tensor of the same shape as the input tensor.
<div class="note">
The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from
the usage of other operations as follow. However, user agents typically have a more efficient implementation for it,
therefore its usage is encouraged from the performance standpoint.
<pre highlight="js">
// The mean reductions happen over the spatial dimensions of the input
// e.g. axis 2 and 3 of the input tensor.
const reduceOptions = { axes: [2,3], keepDimensions: true };
const mean = builder.reduceMean(input, reduceOptions);
const variance = builder.reduceMean(
builder.pow(
builder.sub(input, mean),
buider.constant(2)),
reduceOptions
);
// The scale and bias values are applied per input feature
// e.g. axis 1 of the input tensor.
const shape = [1,-1,1,1];
return builder.add(
builder.mul(
builder.reshape(options.scale, shape),
builder.div(
builder.sub(input, mean),
buidler.pow(
builder.add(variance, options.epsilon),
builder.constant(0.5))
)
),
builder.reshape(options.bias, shape)
);
</pre>
</div>
</div>
### leakyRelu ### {#api-mlgraphbuilder-leakyrelu}
<script type=idl>
dictionary MLLeakyReluOptions {
float alpha = 0.01;
};
partial interface MLGraphBuilder {
MLOperand leakyRelu(MLOperand x, optional MLLeakyReluOptions options = {});
};
</script>
<div algorithm=leakyrelu>
**Arguments:**
- *x*: an {{MLOperand}}. The input tensor.
- *options*: an optional {{MLLeakyReluOptions}}. The optional parameters of the operation.
- *alpha*: a {{float}} scalar multiplier, default to 0.01.
**Returns:** an {{MLOperand}}. The output tensor of the same shape as *x*.
Calculate the <a
href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Leaky_ReLU">
leaky version of rectified linear function</a> on the input tensor
element-wise. The calculation follows the expression `max(0, x) + alpha ∗
min(0, x)`.
<div class="note">
The behavior of this operation can be generically emulated from the usage of
other operations as follow. However, user agents typically have a more
efficient implementation for it, therefore its usage is encouraged from the
performance standpoint.
<pre highlight="js">
return builder.add(builder.max(builder.constant(0), x),
builder.mul(builder.constant(options.alpha), builder.min(builder.constant(0), x)));
</pre>
</div>
</div>
### matmul ### {#api-mlgraphbuilder-matmul}
Compute the matrix product of two input tensors.
<script type=idl>
partial interface MLGraphBuilder {
MLOperand matmul(MLOperand a, MLOperand b);
};
</script>
<div algorithm=matmul>
**Arguments:**
- *a*: an {{MLOperand}}. The first input N-D tensor.
- *b*: an {{MLOperand}}. The second input N-D tensor.
**Returns:** an {{MLOperand}}. The output N-D tensor that contains the matrix
product of two input tensors.
Compute the matrix product of two input tensors. It behaves as following:
- If both *a* and *b* are 2-D, they are multiplied like conventional
matrices and produce a 2-D tensor as the output.
- If either *a* or *b* is N-D, N > 2, it is treated as a stack of
matrices with dimensions corresponding to the last two indices. The
matrix multiplication will be broadcasted accordingly by following
[[!numpy-broadcasting-rule]]. The output is a N-D tensor whose rank
is the maximum rank of the input tensors. For each dimension, except
the last two, of the output tensor, its size is the maximum size