[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

Leaner23 · 2024-01-31T11:06:01Z

Describe the issue

I am building the sentiment analysis model using Bert for the on-device training. I am getting the below error message while loading the generated artifacts of the model:-

RuntimeError Traceback (most recent call last)
Input In [5], in <cell line: 9>()
5 from onnxruntime.capi import _pybind_state as C
7 checkpoint_state = orttraining.CheckpointState.load_checkpoint(
8 r'artifacts\checkpoint')
----> 9 model = orttraining.Module(
10 r"artifacts\training_model.onnx",
11 checkpoint_state,
12 r"artifacts\eval_model.onnx",
13 )
14 optimizer = orttraining.Optimizer(
15 r"artifacts\optimizer.onnx", model
16 )

File ~\AppData\Roaming\Python\Python39\site-packages\onnxruntime\training\api\module.py:54, in Module.init(self, train_model_uri, state, eval_model_uri, device)
47 device_id = 0 if len(options) < 2 else int(options[1])
49 self._device = C.OrtDevice(
50 get_ort_device_type(self._device_type, device_id),
51 C.OrtDevice.default_memory(),
52 device_id,
53 )
---> 54 self._model = C.Module(
55 os.fspath(train_model_uri),
56 state._state,
57 os.fspath(eval_model_uri) if eval_model_uri is not None else None,
58 self._device,
59 )
60 self._state = state

RuntimeError: C:\a_work\1\s\orttraining\orttraining\training_api\module.cc:175 onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1'

To reproduce

Artifacts generation code:-
requires_grad = '''bert.pooler.dense.weight
bert.pooler.dense.bias
linear.weight
linear.bias
linear2.weight
linear2.bias
linear3.weight
linear3.bias'''.split()

frozen_params = [param.name
for param in onnx_model.graph.initializer
if param.name not in requires_grad]

artifacts.generate_artifacts(
onnx_model,
requires_grad=requires_grad,
frozen_params=frozen_params,
loss=artifacts.LossType.BCEWithLogitsLoss,
optimizer=artifacts.OptimType.AdamW,
artifact_directory="artifacts")
bertmodel.zip

import onnx
from onnxruntime.training import artifacts
import onnxruntime.training.api as orttraining
from onnxruntime import InferenceSession
from onnxruntime.capi import _pybind_state as C

checkpoint_state = orttraining.CheckpointState.load_checkpoint(
r'artifacts\checkpoint')
model = orttraining.Module(
r"artifacts\training_model.onnx",
checkpoint_state,
r"artifacts\eval_model.onnx",
)
optimizer = orttraining.Optimizer(
r"artifacts\optimizer.onnx", model
)
artifacts_training.zip

Urgency

very urgent

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15

PyTorch Version

2.2.0+cpu

Execution Provider

Default CPU

Execution Provider Library Version

No response

xadupre · 2024-01-31T11:15:30Z

This issue should be fixed by #18300 and this will be part of the next release coming soon. The onnx model for the model is created with an opset onnxruntime does not support yet. In the new release, the loss is created with the same opset as the model.

Leaner23 · 2024-01-31T11:53:32Z

does this mean that the current version does not support Bert model for on-device training? If the answer is no, can you pls share some references related to this?

xadupre · 2024-01-31T12:15:18Z

It does. It should work if you install a less recent version of onnx. onnx.defs.onnx_opset_version() must return a version number supported by your version of onnxruntime.

Leaner23 · 2024-01-31T12:18:59Z

this is returning 20 and the onnx version that I am using is '1.15.0'. Which version of onnx should I use and the opset version?

xadupre · 2024-01-31T12:58:26Z

This table should give you this information: https://onnxruntime.ai/docs/reference/compatibility.html#onnx-opset-support.

Leaner23 · 2024-01-31T16:58:30Z

I tried changing the version but the above solution does not worked. can you pls guide which node/layer should i remove so that the above error gets resolved? These are the nodes/layers that I am using for training :- 'bert.pooler.dense.weight
bert.pooler.dense.bias
linear.weight
linear.bias
linear2.weight
linear2.bias
linear3.weight
linear3.bias

baijumeswani · 2024-01-31T21:13:40Z

@Leaner23 could you try using a newer version of onnxruntime-training-cpu? I tried your code on my end and I don't see any error:

import onnx
from onnxruntime.training import artifacts

onnx_model = onnx.load("artifacts/bertmodel.onnx")

requires_grad = [
    "bert.pooler.dense.weight",
    "bert.pooler.dense.bias",
    "linear.weight",
    "linear.bias",
    "linear2.weight",
    "linear2.bias",
    "linear3.weight",
    "linear3.bias",
]

frozen_params = [
    param.name
    for param in onnx_model.graph.initializer
    if param.name not in requires_grad
]

artifacts.generate_artifacts(
    onnx_model,
    requires_grad=requires_grad,
    frozen_params=frozen_params,
    loss=artifacts.LossType.BCEWithLogitsLoss,
    optimizer=artifacts.OptimType.AdamW,
    artifact_directory="artifacts",
)

import onnxruntime.training.api as orttraining

checkpoint = orttraining.CheckpointState.load_checkpoint("artifacts/checkpoint")
model = orttraining.Module("artifacts/training_model.onnx", checkpoint, "artifacts/eval_model.onnx")

optimize = orttraining.Optimizer("artifacts/optimizer_model.onnx", model)

To install onnxruntime-training-cpu, uninstall any previous version of onnxruntime-training or onnxruntime and then install onnxruntime-training-cpu using:

python -m pip install cerberus flatbuffers h5py numpy>=1.16.6 onnx packaging protobuf sympy setuptools>=41.4.0
pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/pypi/simple/ onnxruntime-training-cpu

baijumeswani · 2024-02-01T17:22:19Z

I am closing this as I cannot reproduce this error on my end. Please re-open in case this issue still persists.

Leaner23 added the training issues related to ONNX Runtime training; typically submitted using template label Jan 31, 2024

github-actions bot added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label Jan 31, 2024

baijumeswani closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

Leaner23 commented Jan 31, 2024

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024 •

edited

Loading

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024 •

edited

Loading

baijumeswani commented Jan 31, 2024

baijumeswani commented Feb 1, 2024

[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

Comments

Leaner23 commented Jan 31, 2024

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

PyTorch Version

Execution Provider

Execution Provider Library Version

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024 • edited Loading

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024

xadupre commented Jan 31, 2024

Leaner23 commented Jan 31, 2024 • edited Loading

baijumeswani commented Jan 31, 2024

baijumeswani commented Feb 1, 2024

Leaner23 commented Jan 31, 2024 •

edited

Loading

Leaner23 commented Jan 31, 2024 •

edited

Loading