-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue: Onnx Runtime Train Loss Reduction is very Less in every epoch and gets saturated at after 10 epochs #19185
Comments
I assume you trained with the same list of weights with onnxruntime and keras and the same data. + @baijumeswani |
Leaner23 Could you please share your model and data to reproduce the behavior you're seeing? |
Here, I am sharing the Code for the model generation:- import tf2onnx
import onnx
import tensorflow as tf
from tensorflow import keras
import keras_nlp
NUM_LAYERS = 2
EMBD_DIM = 128
FF_DIM = 128
NUM_HEADS = 8
DROPOUT = 0.1
NORM_EPSILON = 1e-9
encoder_input = keras.Input(shape=(MAX_SEQ_LENGTH,), dtype=tf.float32, name='encoder_input')
encoder_embedding_layer = keras_nlp.layers.TokenAndPositionEmbedding(vocabulary_size=VOCAB_SIZE, sequence_length=MAX_SEQ_LENGTH, embedding_dim=EMBD_DIM, mask_zero=True)
encoder_output = encoder_embedding_layer(encoder_input)
encoder_output = keras.layers.LayerNormalization(epsilon=NORM_EPSILON)(encoder_output)
encoder_output = keras.layers.Dropout(rate=DROPOUT)(encoder_output)
for i in range(NUM_LAYERS):
encoder_output = keras_nlp.layers.TransformerEncoder(
intermediate_dim=FF_DIM,
num_heads=NUM_HEADS,
activation=keras.activations.gelu
)(encoder_output)
outputs = keras.layers.GlobalAveragePooling1D()(encoder_output)
outputs = keras.layers.Dense(128, activation="relu")(outputs)
outputs = keras.layers.Dense(1, activation='sigmoid', name='output')(outputs)
transformer = keras.Model(inputs = encoder_input, outputs = outputs)
learning_rate = 3e-5
optimizer = tf.keras.optimizers.experimental.AdamW(learning_rate=learning_rate)
loss = tf.keras.losses.BinaryCrossentropy()
metrics = tf.keras.metrics.BinaryAccuracy()
transformer.compile(loss=loss, metrics=metrics, optimizer=optimizer)
onnx_model, _ = tf2onnx.convert.from_keras(transformer)
onnx.save(onnx_model, 'transformer_Jan_16_3.onnx') you can use these datapoints:-
|
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
I have noticed a similar behavior. However, when I also return the predictions and compute the loss in numpy, the numpy loss decreases. Very odd... |
Describe the issue
I am trying to train a onnx model on device. The loss is reducing very less in each epoch. I tried with different batch size but the problem remains same.
Although when i tried to train the same model on keras , I was able to see the training loss was reducing in every epoch.
Example of training loss on Onnx Runtime:-
Epoch 1 Loss [[26.37851]]
Epoch 2 Loss [[24.919254]]
Epoch 3 Loss [[24.84161]]
Epoch 4 Loss [[24.851688]]
Epoch 5 Loss [[24.845762]]
Epoch 6 Loss [[24.842438]]
Epoch 7 Loss [[24.838167]]
Epoch 8 Loss [[24.836271]]
Epoch 9 Loss [[24.83929]]
Epoch 10 Loss [[24.839489]]
Epoch 11 Loss [[24.850527]]
Epoch 12 Loss [[24.865587]]
Epoch 13 Loss [[24.867554]]
Epoch 14 Loss [[24.873014]]
Epoch 15 Loss [[24.880104]]
Epoch 16 Loss [[24.879396]]
Epoch 17 Loss [[24.882072]]
Epoch 18 Loss [[24.835163]]
Epoch 19 Loss [[24.87151]]
Epoch 20 Loss [[24.835596]]
To reproduce
Code for generating the Training Artifacts:-
Code for training the Model and inferencing:
Urgency
it is very urgent
Platform
Linux
OS Version
ubuntu 20.2
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: