Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreML: Disable 1D ML Program matmul due to bug in coreml #21186

Merged
merged 5 commits into from
Jun 29, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions onnxruntime/core/providers/coreml/DebugMLProgram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Steps to debug an ML Program operator implementation
edgchen1 marked this conversation as resolved.
Show resolved Hide resolved

Basic debugging of everything, excluding model execution, (e.g. partitioning, checking if operator is supported,
adding CoreML operator input/outputs) can be done anywhere as the code is setup to build and be able to create the
protobuf based CoreML Model on all platforms.

To debug model execution issues you will need a macOS machine.

## Debugging invalid output

If there is a crash during execution or unexpected output, the best approach is to see what using coremltools directly
produces.

NOTE: that doesn't guarantee coremltools is correct as there could be a bug in their implementation. It does however
provide a data point on whether we are generating the same CoreML model as the coremltools python.

### Comparing to coremltools output

Create a small test script that replicates the inputs/outputs of the operator you are debugging.
This script should use the coremltools library to run the operator and print the output.
This can be used to compare the CoreML EP's output with the coremltools output.

https://apple.github.io/coremltools/docs-guides/source/model-intermediate-language.html#create-a-mil-program

Usage is reasonably intuitive. The below example defines a model with 2 inputs and a matmul operator.
The model is printed, and run with randomly generated inputs. The output from doing so is printed.

```python
import numpy as np
import coremltools as ct
from coremltools.converters.mil import Builder as mb

target = ct.target.iOS15

x_shape = (1, 4)
y_shape = (10, 4, 3)

@mb.program(input_specs=[mb.TensorSpec(shape=x_shape), mb.TensorSpec(shape=y_shape)],
opset_version=target)
def prog(x, y):
# For reference, a constant can be added using `mb.const` and specifying the data in the `val` parameter.
# c_shape = (3, )
# c_data = np.random.random_sample(c_shape)
# c = mb.const(val=c_data)

# call the operator you are debugging with the inputs/constants.
# See the spec for the operator names, input/outputs and supported data types.
# https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html
z = mb.matmul(x=x, y=y)

# can have additional function calls here if there are multiple operators involved.
# Contrived example that uses a constant and the output from a previous operator:
# z = mb.add(x=z, y=c)

return z

# Prints the MIL program in a reasonably concise manner.
print(prog)

# Convert to ML Program model
m = ct.convert(prog, minimum_deployment_target=target)

# If you want to dump the full protobuf of the model uncomment this.
# You can compare the values to what is being set by the ORT CoreML EP code if you suspect any issues there.
# spec = m.get_spec()
# print(spec)

# run the model to generate output for comparison with the CoreML EP output
x = np.random.rand(*x_shape)
y = np.random.rand(*y_shape)

print(m.predict({'x': x, 'y': y}))
```

## Dumping the ORT generated mlmodel

You can also dump the mlmodel generated by the ORT CoreML EP. This can be handy with larger models.

In a debug build, set the ORT_COREML_EP_MODEL_DIR environment variable to a directory where you want the ML Package
containing the mlmodel to be saved. The model will remain after the CoreML EP exits, unlike the default behavior
where we write it to a temporary directory that is automatically removed on application exit.

Script to dump: [dump_mlprogram_model.py](dump_mlprogram_model.py)

See [here](https://github.com/microsoft/onnxruntime/blob/3c0b407709fd3c71755ed046edd688b30a786d94/onnxruntime/core/providers/coreml/model/host_utils.h#L70-L75) for environment variable setup and [usage](https://github.com/search?q=repo%3Amicrosoft%2Fonnxruntime%20kOverrideModelOutputDirectoryEnvVar%20&type=code).
33 changes: 19 additions & 14 deletions onnxruntime/core/providers/coreml/builders/impl/gemm_op_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -109,19 +109,11 @@ Status GemmOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
ORT_IGNORE_RETURN_VALUE(GetShape(b, b_shape, logger));
int64_t b0 = -1, b1 = -1;

// ML Program MatMul supports N-D input
if (model_builder.CreateMLProgram() && is_matmul) {
if (b_shape.size() == 1) {
// B is treated as {b_shape[0], 1} according to the numpy rules.
b0 = b_shape[0];
b1 = 1;
} else {
// last 2 dims are used
b0 = b_shape[b_shape.size() - 2];
b1 = b_shape[b_shape.size() - 1];
}
// ML Program MatMul supports N-D input, however we don't use the 'K' or 'N' values calculated below for it
// so we don't need to update b0 or b1.
} else {
// we only support 2D input
// we only support 2D input for all other combination
skottmckay marked this conversation as resolved.
Show resolved Hide resolved
b0 = b_shape[0];
b1 = b_shape[1];
}
Expand Down Expand Up @@ -182,7 +174,6 @@ Status GemmOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
model_builder.AddOperation(std::move(gemm_op));
} else {
// CoreML implementation is the same as ONNX MatMul.
// https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.linear.matmul
auto matmul_op = model_builder.CreateOperation(node, "matmul");
AddOperationInput(*matmul_op, "x", a.Name());
AddOperationInput(*matmul_op, "y", b.Name());
Expand Down Expand Up @@ -268,14 +259,28 @@ bool GemmOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
}

if (is_matmul) {
const auto a_rank = a_shape.size();
const auto b_rank = b_shape.size();

if (input_params.create_mlprogram) {
// ML Program matmul op has numpy semantics the same as the ONNX spec so we can use directly
// ML Program matmul op has numpy semantics the same as the ONNX spec, so we can use directly.
// https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.linear.matmul
//
// There does appear to be a bug in handling one of the inputs being 1D, so for now skip these.
// See https://github.com/apple/coremltools/issues/2263
//
// If required for perf we could manually do the shape alterations the spec documents (convert input to 2D,
// and remove extra dimension from output), as the 2D input is correctly handled by CoreML matmul.
if ((a_rank == 1 && b_rank > 1) || (a_rank > 1 && b_rank == 1)) {
LOGS(logger, VERBOSE) << "Skipping due to bug in CoreML ML Program when one of the inputs is 1D.";
return false;
}
} else {
// we could potentially support 1D and 3D if required. beyond 3D the dims that merge diverge.
// https://github.com/apple/coremltools/blob/1931758aae383c83daddfc56f11a24a9d2bf4b87/coremltools/converters/onnx/_operators.py#L1607
// https://github.com/apple/coremltools/blob/1931758aae383c83daddfc56f11a24a9d2bf4b87/coremltools/converters/mil/backend/nn/op_mapping.py#L1374
// https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html#innerproductlayerparams
if (a_shape.size() != 2 || b_shape.size() != 2) {
if (a_rank != 2 || b_rank != 2) {
LOGS(logger, VERBOSE) << "a and b inputs must be 2D. ";
return false;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -906,6 +906,7 @@ Status ModelBuilder::SaveModel() {

#if defined(COREML_ENABLE_MLPROGRAM)
if (create_ml_program_) {
// we need to jump through some hurdles to get the model path the ML Program load wants.
skottmckay marked this conversation as resolved.
Show resolved Hide resolved
std::string tmp_model_path = model_output_path_ + "/tmp/model.mlmodel";
CreateEmptyFile(tmp_model_path);

Expand Down
8 changes: 6 additions & 2 deletions onnxruntime/core/providers/coreml/dump_mlprogram_model.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
import sys

import coremltools as ct

if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <path to model.mlmodel in ML Package>")
print("If generated by onnxruntime this will be <ML Package root>/Data/com.microsoft.onnxruntime/model.mlmodel")
print("The ML Package created by the CoreML EP can saved to a specific directory in a debug build of onnxruntime "
"by setting the environment variable ORT_COREML_EP_MODEL_DIR to the desired directory.")

sys.exit(-1)

model_path = sys.argv[1]
Expand All @@ -13,7 +15,9 @@
spec = m.get_spec()
print(spec)

# Example code if you want to filter output or do more advanced things
# Example code if you want to filter output or do more advanced things.
# In the below example we print out the value of an attribute of one specific node from a larger model.
#
# main = spec.mlProgram.functions["main"]
# block = main.block_specializations[main.opset]
# print(f"{len(block.operations)} operators")
Expand Down
Loading