Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Training] Onnxruntime OnDevice training : onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1' #19351

Closed
Leaner23 opened this issue Jan 31, 2024 · 8 comments
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. training issues related to ONNX Runtime training; typically submitted using template

Comments

@Leaner23
Copy link

Describe the issue

I am building the sentiment analysis model using Bert for the on-device training. I am getting the below error message while loading the generated artifacts of the model:-

RuntimeError Traceback (most recent call last)
Input In [5], in <cell line: 9>()
5 from onnxruntime.capi import _pybind_state as C
7 checkpoint_state = orttraining.CheckpointState.load_checkpoint(
8 r'artifacts\checkpoint')
----> 9 model = orttraining.Module(
10 r"artifacts\training_model.onnx",
11 checkpoint_state,
12 r"artifacts\eval_model.onnx",
13 )
14 optimizer = orttraining.Optimizer(
15 r"artifacts\optimizer.onnx", model
16 )

File ~\AppData\Roaming\Python\Python39\site-packages\onnxruntime\training\api\module.py:54, in Module.init(self, train_model_uri, state, eval_model_uri, device)
47 device_id = 0 if len(options) < 2 else int(options[1])
49 self._device = C.OrtDevice(
50 get_ort_device_type(self._device_type, device_id),
51 C.OrtDevice.default_memory(),
52 device_id,
53 )
---> 54 self._model = C.Module(
55 os.fspath(train_model_uri),
56 state._state,
57 os.fspath(eval_model_uri) if eval_model_uri is not None else None,
58 self._device,
59 )
60 self._state = state

RuntimeError: C:\a_work\1\s\orttraining\orttraining\training_api\module.cc:175 onnxruntime::training::api::Module::Module [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Shape(19) node with name '/bert/Shape_1'

To reproduce

Artifacts generation code:-
requires_grad = '''bert.pooler.dense.weight
bert.pooler.dense.bias
linear.weight
linear.bias
linear2.weight
linear2.bias
linear3.weight
linear3.bias'''.split()

frozen_params = [param.name
for param in onnx_model.graph.initializer
if param.name not in requires_grad]

artifacts.generate_artifacts(
onnx_model,
requires_grad=requires_grad,
frozen_params=frozen_params,
loss=artifacts.LossType.BCEWithLogitsLoss,
optimizer=artifacts.OptimType.AdamW,
artifact_directory="artifacts")
bertmodel.zip

import onnx
from onnxruntime.training import artifacts
import onnxruntime.training.api as orttraining
from onnxruntime import InferenceSession
from onnxruntime.capi import _pybind_state as C

checkpoint_state = orttraining.CheckpointState.load_checkpoint(
r'artifacts\checkpoint')
model = orttraining.Module(
r"artifacts\training_model.onnx",
checkpoint_state,
r"artifacts\eval_model.onnx",
)
optimizer = orttraining.Optimizer(
r"artifacts\optimizer.onnx", model
)
artifacts_training.zip

Urgency

very urgent

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15

PyTorch Version

2.2.0+cpu

Execution Provider

Default CPU

Execution Provider Library Version

No response

@Leaner23 Leaner23 added the training issues related to ONNX Runtime training; typically submitted using template label Jan 31, 2024
@github-actions github-actions bot added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label Jan 31, 2024
@xadupre
Copy link
Member

xadupre commented Jan 31, 2024

This issue should be fixed by #18300 and this will be part of the next release coming soon. The onnx model for the model is created with an opset onnxruntime does not support yet. In the new release, the loss is created with the same opset as the model.

@Leaner23
Copy link
Author

Leaner23 commented Jan 31, 2024

does this mean that the current version does not support Bert model for on-device training? If the answer is no, can you pls share some references related to this?

@xadupre
Copy link
Member

xadupre commented Jan 31, 2024

It does. It should work if you install a less recent version of onnx. onnx.defs.onnx_opset_version() must return a version number supported by your version of onnxruntime.

@Leaner23
Copy link
Author

this is returning 20 and the onnx version that I am using is '1.15.0'. Which version of onnx should I use and the opset version?

@xadupre
Copy link
Member

xadupre commented Jan 31, 2024

This table should give you this information: https://onnxruntime.ai/docs/reference/compatibility.html#onnx-opset-support.

@Leaner23
Copy link
Author

Leaner23 commented Jan 31, 2024

I tried changing the version but the above solution does not worked. can you pls guide which node/layer should i remove so that the above error gets resolved? These are the nodes/layers that I am using for training :- 'bert.pooler.dense.weight
bert.pooler.dense.bias
linear.weight
linear.bias
linear2.weight
linear2.bias
linear3.weight
linear3.bias

@baijumeswani
Copy link
Contributor

@Leaner23 could you try using a newer version of onnxruntime-training-cpu? I tried your code on my end and I don't see any error:

import onnx
from onnxruntime.training import artifacts

onnx_model = onnx.load("artifacts/bertmodel.onnx")

requires_grad = [
    "bert.pooler.dense.weight",
    "bert.pooler.dense.bias",
    "linear.weight",
    "linear.bias",
    "linear2.weight",
    "linear2.bias",
    "linear3.weight",
    "linear3.bias",
]

frozen_params = [
    param.name
    for param in onnx_model.graph.initializer
    if param.name not in requires_grad
]

artifacts.generate_artifacts(
    onnx_model,
    requires_grad=requires_grad,
    frozen_params=frozen_params,
    loss=artifacts.LossType.BCEWithLogitsLoss,
    optimizer=artifacts.OptimType.AdamW,
    artifact_directory="artifacts",
)
import onnxruntime.training.api as orttraining

checkpoint = orttraining.CheckpointState.load_checkpoint("artifacts/checkpoint")
model = orttraining.Module("artifacts/training_model.onnx", checkpoint, "artifacts/eval_model.onnx")

optimize = orttraining.Optimizer("artifacts/optimizer_model.onnx", model)

To install onnxruntime-training-cpu, uninstall any previous version of onnxruntime-training or onnxruntime and then install onnxruntime-training-cpu using:

python -m pip install cerberus flatbuffers h5py numpy>=1.16.6 onnx packaging protobuf sympy setuptools>=41.4.0
pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/pypi/simple/ onnxruntime-training-cpu

@baijumeswani
Copy link
Contributor

I am closing this as I cannot reproduce this error on my end. Please re-open in case this issue still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

No branches or pull requests

3 participants