Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Training] How to train all the layers of the onnx training model #19186

Open
Leaner23 opened this issue Jan 17, 2024 · 4 comments
Open

[Training] How to train all the layers of the onnx training model #19186

Leaner23 opened this issue Jan 17, 2024 · 4 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template

Comments

@Leaner23
Copy link

Describe the issue

I am trying to train all the layers of the onnx model but getting below error message.

RuntimeError: C:\a_work\1\s\orttraining\orttraining\core\framework\gradient_graph_builder.cc:100 onnxruntime::training::GradientGraphBuilder::GradientGraphBuilder const_fold_opt__2874 couldn't find the consumer node.

I am using transformer encoder onnx model for training. Is it possible to train all the layers of the onnx model? Please share the reference for training all layers.

To reproduce

import onnx
from onnxruntime.training import artifacts
model_name ='ContentModel'

Load the onnx model.

onnx_model = onnx.load(f"{model_name}.onnx")

requires_grad = [param.name
for param in onnx_model.graph.initializer]
frozen_params = [ ]

Generate the training artifacts.

artifacts.generate_artifacts(
onnx_model,
requires_grad=requires_grad,
frozen_params=frozen_params,
loss=artifacts.LossType.BCEWithLogitsLoss,
optimizer=artifacts.OptimType.AdamW,
artifact_directory="training_demo"
)

Urgency

It is very urgent.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16

PyTorch Version

2.1.2+cpu

Execution Provider

Default CPU

Execution Provider Library Version

No response

@Leaner23 Leaner23 added the training issues related to ONNX Runtime training; typically submitted using template label Jan 17, 2024
@xadupre
Copy link
Member

xadupre commented Jan 17, 2024

There may be two issues.

First one is related to the way you created your model. The output name in the error message suggests it is a folded constant. Folding constant means optimizing the model by computing every node taking only constant as inputs. This optimization needs to be disabled for training in order to train the original weights.

Second issue is due to the fast there are two kinds of constants in an onnx models, initializers and output of "Constant" operators. These must be included as well in requires_grad.

@Leaner23 Leaner23 reopened this Jan 18, 2024
@Leaner23
Copy link
Author

Here i have included all the graph initializer's param in the requires_grad. What else you want me to add in the requires_grad ?

@Leaner23
Copy link
Author

Here, I am sharing the Code for the model generation:-
import tf2onnx
import onnx
import tensorflow as tf
from tensorflow import keras
import keras_nlp

NUM_LAYERS = 2
EMBD_DIM = 128
FF_DIM = 128
NUM_HEADS = 8
DROPOUT = 0.1
NORM_EPSILON = 1e-9

encoder_input = keras.Input(shape=(MAX_SEQ_LENGTH,), dtype=tf.float32, name='encoder_input')
encoder_embedding_layer = keras_nlp.layers.TokenAndPositionEmbedding(vocabulary_size=VOCAB_SIZE, sequence_length=MAX_SEQ_LENGTH, embedding_dim=EMBD_DIM, mask_zero=True)
encoder_output = encoder_embedding_layer(encoder_input)
encoder_output = keras.layers.LayerNormalization(epsilon=NORM_EPSILON)(encoder_output)
encoder_output = keras.layers.Dropout(rate=DROPOUT)(encoder_output)
for i in range(NUM_LAYERS):
encoder_output = keras_nlp.layers.TransformerEncoder(
intermediate_dim=FF_DIM,
num_heads=NUM_HEADS,
activation=keras.activations.gelu
)(encoder_output)
outputs = keras.layers.GlobalAveragePooling1D()(encoder_output)
outputs = keras.layers.Dense(128, activation="relu")(outputs)
outputs = keras.layers.Dense(1, activation='sigmoid', name='output')(outputs)

transformer = keras.Model(inputs = encoder_input, outputs = outputs)

learning_rate = 3e-5
optimizer = tf.keras.optimizers.experimental.AdamW(learning_rate=learning_rate)
loss = tf.keras.losses.BinaryCrossentropy()
metrics = tf.keras.metrics.BinaryAccuracy()

transformer.compile(loss=loss, metrics=metrics, optimizer=optimizer)

onnx_model, _ = tf2onnx.convert.from_keras(transformer)
onnx.save(onnx_model, 'transformer_Jan_16_3.onnx')

you can use these datapoints:-

data = [[1.000e+00 7.780e+02 1.280e+02 7.400e+01 1.200e+01 6.300e+02 1.630e+02
1.500e+01 4.000e+00 1.766e+03 7.982e+03 1.051e+03 2.000e+00 3.200e+01
8.500e+01 1.560e+02 4.500e+01 4.000e+01 1.480e+02 1.390e+02 1.210e+02
6.640e+02 6.650e+02 1.000e+01 1.000e+01 1.361e+03 1.730e+02 4.000e+00
7.490e+02 2.000e+00 1.600e+01 3.804e+03 8.000e+00 4.000e+00 2.260e+02
6.500e+01 1.200e+01 4.300e+01 1.270e+02 2.400e+01 2.000e+00 1.000e+01
1.000e+01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 6.740e+03 3.650e+02 1.234e+03 5.000e+00 1.156e+03 3.540e+02
1.100e+01 1.400e+01 5.327e+03 6.638e+03 7.000e+00 1.016e+03 2.000e+00
5.940e+03 3.560e+02 4.400e+01 4.000e+00 1.349e+03 5.000e+02 7.460e+02
5.000e+00 2.000e+02 4.000e+00 4.132e+03 1.100e+01 2.000e+00 9.363e+03
1.117e+03 1.831e+03 7.485e+03 5.000e+00 4.831e+03 2.600e+01 6.000e+00
2.000e+00 4.183e+03 1.700e+01 3.690e+02 3.700e+01 2.150e+02 1.345e+03
1.430e+02 2.000e+00 5.000e+00 1.838e+03 8.000e+00 1.974e+03 1.500e+01
3.600e+01 1.190e+02 2.570e+02 8.500e+01 5.200e+01 4.860e+02 9.000e+00
6.000e+00 2.000e+00 8.564e+03 6.300e+01 2.710e+02 6.000e+00 1.960e+02
9.600e+01 9.490e+02 4.121e+03 4.000e+00 2.000e+00 7.000e+00 4.000e+00
2.212e+03 2.436e+03 8.190e+02 6.300e+01 4.700e+01 7.700e+01 7.175e+03
1.800e+02 6.000e+00 2.270e+02 1.100e+01 9.400e+01 2.494e+03 2.000e+00
1.300e+01 4.230e+02 4.000e+00 1.680e+02 7.000e+00 4.000e+00 2.200e+01
5.000e+00 8.900e+01 6.650e+02 7.100e+01 2.700e+02 5.600e+01 5.000e+00
1.300e+01 1.970e+02 1.200e+01 1.610e+02 5.390e+03 9.900e+01 7.600e+01
2.300e+01 2.000e+00 7.000e+00 4.190e+02 6.650e+02 4.000e+01 9.100e+01
8.500e+01 1.080e+02 7.000e+00 4.000e+00 2.084e+03 5.000e+00 4.773e+03
8.100e+01 5.500e+01 5.200e+01 1.901e+03 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 5.400e+01 1.300e+01 1.610e+03 1.400e+01 2.000e+01 1.300e+01
6.900e+01 5.500e+01 3.640e+02 1.398e+03 2.100e+01 5.400e+01 1.300e+01
2.190e+02 1.200e+01 1.300e+01 1.706e+03 1.500e+01 4.000e+00 2.000e+01
1.600e+01 3.290e+02 6.000e+00 1.760e+02 3.290e+02 7.400e+01 5.100e+01
1.300e+01 8.730e+02 4.000e+00 1.560e+02 7.100e+01 7.800e+01 4.000e+00
7.412e+03 3.220e+02 1.600e+01 3.100e+01 7.000e+00 4.000e+00 2.490e+02
4.000e+00 6.500e+01 1.600e+01 3.800e+01 3.790e+02 1.200e+01 1.000e+02
1.570e+02 1.800e+01 6.000e+00 9.100e+02 2.000e+01 5.490e+02 1.800e+01
4.000e+00 1.496e+03 2.100e+01 1.400e+01 3.100e+01 9.000e+00 2.400e+01
6.000e+00 2.120e+02 1.200e+01 9.000e+00 6.000e+00 1.322e+03 9.910e+02
7.000e+00 3.002e+03 4.000e+00 4.250e+02 9.000e+00 7.300e+01 2.218e+03
5.490e+02 1.800e+01 3.100e+01 1.550e+02 3.600e+01 1.000e+02 7.630e+02
3.790e+02 2.000e+01 1.030e+02 3.510e+02 5.308e+03 1.300e+01 2.020e+02
1.200e+01 2.241e+03 5.000e+00 6.000e+00 3.200e+02 4.600e+01 7.000e+00
4.570e+02 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 1.300e+01 1.190e+02 9.540e+02 1.890e+02 1.554e+03 1.300e+01
9.200e+01 4.590e+02 4.800e+01 4.000e+00 1.160e+02 9.000e+00 1.492e+03
2.291e+03 4.200e+01 7.260e+02 4.000e+00 1.939e+03 1.680e+02 2.031e+03
1.300e+01 4.230e+02 1.400e+01 2.000e+01 5.490e+02 1.800e+01 4.000e+00
2.000e+00 5.470e+02 3.200e+01 4.000e+00 9.600e+01 3.900e+01 4.000e+00
4.540e+02 7.000e+00 4.000e+00 2.200e+01 8.000e+00 4.000e+00 5.500e+01
1.300e+02 1.680e+02 1.300e+01 9.200e+01 3.590e+02 6.000e+00 1.580e+02
1.511e+03 2.000e+00 4.200e+01 6.000e+00 1.913e+03 1.900e+01 1.940e+02
4.455e+03 4.121e+03 6.000e+00 1.140e+02 8.000e+00 7.200e+01 2.100e+01
4.650e+02 9.667e+03 3.040e+02 4.000e+00 5.100e+01 9.000e+00 1.400e+01
2.000e+01 4.400e+01 1.550e+02 8.000e+00 6.000e+00 2.260e+02 1.620e+02
6.160e+02 6.510e+02 5.100e+01 9.000e+00 1.400e+01 2.000e+01 4.400e+01
1.000e+01 1.000e+01 1.400e+01 2.180e+02 4.843e+03 6.290e+02 4.200e+01
3.017e+03 2.100e+01 4.800e+01 2.500e+01 2.800e+01 3.500e+01 5.340e+02
5.000e+00 6.000e+00 3.200e+02 8.000e+00 5.160e+02 5.000e+00 4.200e+01
2.500e+01 1.810e+02 8.000e+00 1.300e+02 5.600e+01 5.470e+02 3.571e+03
5.000e+00 1.471e+03 8.510e+02 1.400e+01 2.286e+03 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]
[1.000e+00 5.030e+02 2.000e+01 3.300e+01 1.180e+02 4.810e+02 3.020e+02
2.600e+01 1.840e+02 5.200e+01 8.350e+02 1.120e+03 5.420e+02 2.603e+03
1.300e+01 1.408e+03 4.500e+01 6.000e+00 2.364e+03 1.000e+01 1.000e+01
2.500e+01 2.760e+02 4.900e+01 2.000e+00 3.239e+03 1.100e+01 1.290e+02
1.642e+03 8.000e+00 6.070e+02 2.500e+01 3.900e+01 8.520e+02 5.226e+03
2.000e+00 2.500e+01 6.050e+02 8.520e+02 3.925e+03 5.000e+00 2.777e+03
4.600e+01 8.520e+02 2.000e+00 2.500e+01 2.146e+03 3.000e+01 6.080e+02
4.044e+03 1.000e+01 1.000e+01 2.500e+01 7.890e+02 3.400e+01 4.000e+00
2.000e+00 5.400e+01 1.544e+03 2.173e+03 2.018e+03 2.500e+01 7.900e+01
7.200e+01 2.020e+02 7.200e+01 6.000e+00 9.680e+02 2.000e+00 1.000e+01
1.000e+01 2.872e+03 7.500e+01 3.590e+02 2.872e+03 6.214e+03 4.000e+00
2.000e+00 3.200e+01 7.500e+01 2.800e+01 9.000e+00 1.400e+01 2.000e+00
1.000e+01 1.000e+01 8.840e+02 1.866e+03 9.000e+00 4.000e+00 4.017e+03
2.809e+03 1.000e+01 1.000e+01 7.190e+02 2.000e+00 7.000e+01 2.885e+03
4.000e+00 2.552e+03 2.000e+00 4.430e+03 1.750e+02 6.640e+03 1.100e+01
4.000e+00 2.000e+00 5.430e+02 1.609e+03 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00]]

label : [0., 1., 0., 0., 0.]

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

No branches or pull requests

2 participants