[Training] How to train all the layers of the onnx training model #19186

Leaner23 · 2024-01-17T14:43:38Z

Describe the issue

I am trying to train all the layers of the onnx model but getting below error message.

RuntimeError: C:\a_work\1\s\orttraining\orttraining\core\framework\gradient_graph_builder.cc:100 onnxruntime::training::GradientGraphBuilder::GradientGraphBuilder const_fold_opt__2874 couldn't find the consumer node.

I am using transformer encoder onnx model for training. Is it possible to train all the layers of the onnx model? Please share the reference for training all layers.

To reproduce

import onnx
from onnxruntime.training import artifacts
model_name ='ContentModel'

Load the onnx model.

onnx_model = onnx.load(f"{model_name}.onnx")

requires_grad = [param.name
for param in onnx_model.graph.initializer]
frozen_params = [ ]

Generate the training artifacts.

artifacts.generate_artifacts(
onnx_model,
requires_grad=requires_grad,
frozen_params=frozen_params,
loss=artifacts.LossType.BCEWithLogitsLoss,
optimizer=artifacts.OptimType.AdamW,
artifact_directory="training_demo"
)

Urgency

It is very urgent.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16

PyTorch Version

2.1.2+cpu

Execution Provider

Default CPU

Execution Provider Library Version

No response

xadupre · 2024-01-17T14:55:02Z

There may be two issues.

First one is related to the way you created your model. The output name in the error message suggests it is a folded constant. Folding constant means optimizing the model by computing every node taking only constant as inputs. This optimization needs to be disabled for training in order to train the original weights.

Second issue is due to the fast there are two kinds of constants in an onnx models, initializers and output of "Constant" operators. These must be included as well in requires_grad.

Leaner23 · 2024-01-18T06:54:24Z

Here i have included all the graph initializer's param in the requires_grad. What else you want me to add in the requires_grad ?

Leaner23 · 2024-01-18T07:59:42Z

Here, I am sharing the Code for the model generation:-
import tf2onnx
import onnx
import tensorflow as tf
from tensorflow import keras
import keras_nlp

NUM_LAYERS = 2
EMBD_DIM = 128
FF_DIM = 128
NUM_HEADS = 8
DROPOUT = 0.1
NORM_EPSILON = 1e-9

encoder_input = keras.Input(shape=(MAX_SEQ_LENGTH,), dtype=tf.float32, name='encoder_input')
encoder_embedding_layer = keras_nlp.layers.TokenAndPositionEmbedding(vocabulary_size=VOCAB_SIZE, sequence_length=MAX_SEQ_LENGTH, embedding_dim=EMBD_DIM, mask_zero=True)
encoder_output = encoder_embedding_layer(encoder_input)
encoder_output = keras.layers.LayerNormalization(epsilon=NORM_EPSILON)(encoder_output)
encoder_output = keras.layers.Dropout(rate=DROPOUT)(encoder_output)
for i in range(NUM_LAYERS):
encoder_output = keras_nlp.layers.TransformerEncoder(
intermediate_dim=FF_DIM,
num_heads=NUM_HEADS,
activation=keras.activations.gelu
)(encoder_output)
outputs = keras.layers.GlobalAveragePooling1D()(encoder_output)
outputs = keras.layers.Dense(128, activation="relu")(outputs)
outputs = keras.layers.Dense(1, activation='sigmoid', name='output')(outputs)

transformer = keras.Model(inputs = encoder_input, outputs = outputs)

learning_rate = 3e-5
optimizer = tf.keras.optimizers.experimental.AdamW(learning_rate=learning_rate)
loss = tf.keras.losses.BinaryCrossentropy()
metrics = tf.keras.metrics.BinaryAccuracy()

transformer.compile(loss=loss, metrics=metrics, optimizer=optimizer)

onnx_model, _ = tf2onnx.convert.from_keras(transformer)
onnx.save(onnx_model, 'transformer_Jan_16_3.onnx')

you can use these datapoints:-

data = [[1.000e+00 7.780e+02 1.280e+02 7.400e+01 1.200e+01 6.300e+02 1.630e+02
1.500e+01 4.000e+00 1.766e+03 7.982e+03 1.051e+03 2.000e+00 3.200e+01
8.500e+01 1.560e+02 4.500e+01 4.000e+01 1.480e+02 1.390e+02 1.210e+02
6.640e+02 6.650e+02 1.000e+01 1.000e+01 1.361e+03 1.730e+02 4.000e+00
7.490e+02 2.000e+00 1.600e+01 3.804e+03 8.000e+00 4.000e+00 2.260e+02
6.500e+01 1.200e+01 4.300e+01 1.270e+02 2.400e+01 2.000e+00 1.000e+01
1.000e+01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 6.740e+03 3.650e+02 1.234e+03 5.000e+00 1.156e+03 3.540e+02
1.100e+01 1.400e+01 5.327e+03 6.638e+03 7.000e+00 1.016e+03 2.000e+00
5.940e+03 3.560e+02 4.400e+01 4.000e+00 1.349e+03 5.000e+02 7.460e+02
5.000e+00 2.000e+02 4.000e+00 4.132e+03 1.100e+01 2.000e+00 9.363e+03
1.117e+03 1.831e+03 7.485e+03 5.000e+00 4.831e+03 2.600e+01 6.000e+00
2.000e+00 4.183e+03 1.700e+01 3.690e+02 3.700e+01 2.150e+02 1.345e+03
1.430e+02 2.000e+00 5.000e+00 1.838e+03 8.000e+00 1.974e+03 1.500e+01
3.600e+01 1.190e+02 2.570e+02 8.500e+01 5.200e+01 4.860e+02 9.000e+00
6.000e+00 2.000e+00 8.564e+03 6.300e+01 2.710e+02 6.000e+00 1.960e+02
9.600e+01 9.490e+02 4.121e+03 4.000e+00 2.000e+00 7.000e+00 4.000e+00
2.212e+03 2.436e+03 8.190e+02 6.300e+01 4.700e+01 7.700e+01 7.175e+03
1.800e+02 6.000e+00 2.270e+02 1.100e+01 9.400e+01 2.494e+03 2.000e+00
1.300e+01 4.230e+02 4.000e+00 1.680e+02 7.000e+00 4.000e+00 2.200e+01
5.000e+00 8.900e+01 6.650e+02 7.100e+01 2.700e+02 5.600e+01 5.000e+00
1.300e+01 1.970e+02 1.200e+01 1.610e+02 5.390e+03 9.900e+01 7.600e+01
2.300e+01 2.000e+00 7.000e+00 4.190e+02 6.650e+02 4.000e+01 9.100e+01
8.500e+01 1.080e+02 7.000e+00 4.000e+00 2.084e+03 5.000e+00 4.773e+03
8.100e+01 5.500e+01 5.200e+01 1.901e+03 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 5.400e+01 1.300e+01 1.610e+03 1.400e+01 2.000e+01 1.300e+01
6.900e+01 5.500e+01 3.640e+02 1.398e+03 2.100e+01 5.400e+01 1.300e+01
2.190e+02 1.200e+01 1.300e+01 1.706e+03 1.500e+01 4.000e+00 2.000e+01
1.600e+01 3.290e+02 6.000e+00 1.760e+02 3.290e+02 7.400e+01 5.100e+01
1.300e+01 8.730e+02 4.000e+00 1.560e+02 7.100e+01 7.800e+01 4.000e+00
7.412e+03 3.220e+02 1.600e+01 3.100e+01 7.000e+00 4.000e+00 2.490e+02
4.000e+00 6.500e+01 1.600e+01 3.800e+01 3.790e+02 1.200e+01 1.000e+02
1.570e+02 1.800e+01 6.000e+00 9.100e+02 2.000e+01 5.490e+02 1.800e+01
4.000e+00 1.496e+03 2.100e+01 1.400e+01 3.100e+01 9.000e+00 2.400e+01
6.000e+00 2.120e+02 1.200e+01 9.000e+00 6.000e+00 1.322e+03 9.910e+02
7.000e+00 3.002e+03 4.000e+00 4.250e+02 9.000e+00 7.300e+01 2.218e+03
5.490e+02 1.800e+01 3.100e+01 1.550e+02 3.600e+01 1.000e+02 7.630e+02
3.790e+02 2.000e+01 1.030e+02 3.510e+02 5.308e+03 1.300e+01 2.020e+02
1.200e+01 2.241e+03 5.000e+00 6.000e+00 3.200e+02 4.600e+01 7.000e+00
4.570e+02 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 1.300e+01 1.190e+02 9.540e+02 1.890e+02 1.554e+03 1.300e+01
9.200e+01 4.590e+02 4.800e+01 4.000e+00 1.160e+02 9.000e+00 1.492e+03
2.291e+03 4.200e+01 7.260e+02 4.000e+00 1.939e+03 1.680e+02 2.031e+03
1.300e+01 4.230e+02 1.400e+01 2.000e+01 5.490e+02 1.800e+01 4.000e+00
2.000e+00 5.470e+02 3.200e+01 4.000e+00 9.600e+01 3.900e+01 4.000e+00
4.540e+02 7.000e+00 4.000e+00 2.200e+01 8.000e+00 4.000e+00 5.500e+01
1.300e+02 1.680e+02 1.300e+01 9.200e+01 3.590e+02 6.000e+00 1.580e+02
1.511e+03 2.000e+00 4.200e+01 6.000e+00 1.913e+03 1.900e+01 1.940e+02
4.455e+03 4.121e+03 6.000e+00 1.140e+02 8.000e+00 7.200e+01 2.100e+01
4.650e+02 9.667e+03 3.040e+02 4.000e+00 5.100e+01 9.000e+00 1.400e+01
2.000e+01 4.400e+01 1.550e+02 8.000e+00 6.000e+00 2.260e+02 1.620e+02
6.160e+02 6.510e+02 5.100e+01 9.000e+00 1.400e+01 2.000e+01 4.400e+01
1.000e+01 1.000e+01 1.400e+01 2.180e+02 4.843e+03 6.290e+02 4.200e+01
3.017e+03 2.100e+01 4.800e+01 2.500e+01 2.800e+01 3.500e+01 5.340e+02
5.000e+00 6.000e+00 3.200e+02 8.000e+00 5.160e+02 5.000e+00 4.200e+01
2.500e+01 1.810e+02 8.000e+00 1.300e+02 5.600e+01 5.470e+02 3.571e+03
5.000e+00 1.471e+03 8.510e+02 1.400e+01 2.286e+03 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 5.030e+02 2.000e+01 3.300e+01 1.180e+02 4.810e+02 3.020e+02
2.600e+01 1.840e+02 5.200e+01 8.350e+02 1.120e+03 5.420e+02 2.603e+03
1.300e+01 1.408e+03 4.500e+01 6.000e+00 2.364e+03 1.000e+01 1.000e+01
2.500e+01 2.760e+02 4.900e+01 2.000e+00 3.239e+03 1.100e+01 1.290e+02
1.642e+03 8.000e+00 6.070e+02 2.500e+01 3.900e+01 8.520e+02 5.226e+03
2.000e+00 2.500e+01 6.050e+02 8.520e+02 3.925e+03 5.000e+00 2.777e+03
4.600e+01 8.520e+02 2.000e+00 2.500e+01 2.146e+03 3.000e+01 6.080e+02
4.044e+03 1.000e+01 1.000e+01 2.500e+01 7.890e+02 3.400e+01 4.000e+00
2.000e+00 5.400e+01 1.544e+03 2.173e+03 2.018e+03 2.500e+01 7.900e+01
7.200e+01 2.020e+02 7.200e+01 6.000e+00 9.680e+02 2.000e+00 1.000e+01
1.000e+01 2.872e+03 7.500e+01 3.590e+02 2.872e+03 6.214e+03 4.000e+00
2.000e+00 3.200e+01 7.500e+01 2.800e+01 9.000e+00 1.400e+01 2.000e+00
1.000e+01 1.000e+01 8.840e+02 1.866e+03 9.000e+00 4.000e+00 4.017e+03
2.809e+03 1.000e+01 1.000e+01 7.190e+02 2.000e+00 7.000e+01 2.885e+03
4.000e+00 2.552e+03 2.000e+00 4.430e+03 1.750e+02 6.640e+03 1.100e+01
4.000e+00 2.000e+00 5.430e+02 1.609e+03 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]]

label : [0., 1., 0., 0., 0.]

github-actions · 2024-02-17T15:01:04Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Leaner23 added the training issues related to ONNX Runtime training; typically submitted using template label Jan 17, 2024

Leaner23 closed this as completed Jan 18, 2024

Leaner23 reopened this Jan 18, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] How to train all the layers of the onnx training model #19186

[Training] How to train all the layers of the onnx training model #19186

Leaner23 commented Jan 17, 2024

xadupre commented Jan 17, 2024

Leaner23 commented Jan 18, 2024

Leaner23 commented Jan 18, 2024

github-actions bot commented Feb 17, 2024

[Training] How to train all the layers of the onnx training model #19186

[Training] How to train all the layers of the onnx training model #19186

Comments

Leaner23 commented Jan 17, 2024

Describe the issue

To reproduce

Load the onnx model.

Generate the training artifacts.

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

PyTorch Version

Execution Provider

Execution Provider Library Version

xadupre commented Jan 17, 2024

Leaner23 commented Jan 18, 2024

Leaner23 commented Jan 18, 2024

github-actions bot commented Feb 17, 2024