Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM training is super slow on GPU #34

Open
phgilde opened this issue Aug 6, 2020 · 7 comments
Open

LSTM training is super slow on GPU #34

phgilde opened this issue Aug 6, 2020 · 7 comments

Comments

@phgilde
Copy link

phgilde commented Aug 6, 2020

This training loop takes more than a second per epoch using tensorflow-directml but a fraction of a second with standard tensorflow.
It actually doesnt work at all (error is NaN after a couple of iterations) but I already opened another Issue for that.

Code:

import tensorflow as tf
import numpy as np
from tensorflow import keras
import matplotlib.pyplot as plt
import time
from datetime import timedelta

def fn(x):
    return tf.sin(x)

seq_length = 200
x = tf.linspace(tf.constant(0, dtype=tf.float32), 50, seq_length)
y = fn(x)

n_outputs = 50
model = keras.layers.LSTM(n_outputs, return_sequences=True)
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = keras.losses.MSE

loss_history = []
epochs = 2_000
out_epochs = 10
start = time.time()
for epoch in range(epochs):
    with tf.GradientTape() as tape:
        y_pred = model(tf.zeros(shape=(1, seq_length, 1)))
        y_pred_data = y_pred[0, :, 0]
        loss = loss_fn(y, y_pred_data)
    loss_history.append(loss.numpy())
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    if epoch % out_epochs == 0:
        print(f"Epoch {epoch}: Loss = {loss} ({timedelta(seconds=time.time()-start)})")

System: Intel i5-7200U with Intel HD graphics 620

@PatriceVignola
Copy link
Contributor

Thank you for reporting this @phgilde . Are you running this script on Windows or WSL?

@phgilde
Copy link
Author

phgilde commented Aug 7, 2020

@PatriceVignola I'm running this on windows

@jstoecker jstoecker transferred this issue from microsoft/DirectML Sep 17, 2020
@jstoecker
Copy link
Contributor

We've implemented the single-step/block-based LSTM/GRU/RNN ops, but these are really better suited to CPU architectures. Models typically use the multi-step cuDNN ops when executing on a GPU device. It's not unsurprising that there's some more work here to make DML perform better with recurrent networks.

@wchao1115
Copy link

@phgilde What GPU you're running this with? You mentioned standard tensorflow and that your config is with Intel HD graphics. Is this training script running on CPU?

@ghostlypi
Copy link

I've had the same issue on an RX 560. In task manager neither the GPU nor the CPU seems to take on any load.
image

@onurberkay
Copy link

I have same problem with 4750u amd apu , also gpu load not even %1-2

@PatriceVignola
Copy link
Contributor

@onurberkay What does tf.config.list_physical_devices() give you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants