Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM 8x slow on gpu #367

Open
onurberkay opened this issue Apr 25, 2022 · 5 comments
Open

LSTM 8x slow on gpu #367

onurberkay opened this issue Apr 25, 2022 · 5 comments

Comments

@onurberkay
Copy link

image
Train on 36090 samples 2022-04-25 21:56:12.505195: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library C:\Users\onurb\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python/directml.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll 2022-04-25 21:56:12.506028: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library dxgi.dll 2022-04-25 21:56:12.509302: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library d3d12.dll 2022-04-25 21:56:12.961954: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 1 compatible adapters. 2022-04-25 21:56:12.962441: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2022-04-25 21:56:12.966749: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (AMD Radeon(TM) Graphics) 2022-04-25 21:56:13.055907: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library Kernel32.dll 36090/36090 - 232s - loss: 0.0014 - acc: 0.0396 Train on 36090 samples

when using only cpu takes 30-40s there is huge difference. Also look like Gpu not taking load.
I am using 4750u apu

@PatriceVignola
Copy link
Contributor

Is there a repro script that you would be able to provide us? Otherwise, it would help if you could send us the device placement logs. Run tf.debugging.set_log_device_placement(True) before redirecting the output to a file.

@onurberkay
Copy link
Author

out.txt @PatriceVignola
just a simple code. need some libraries to run => pip install yfinance / pip install scikit-learn / pip install matplotlib
https://www.online-python.com/2Pa6iM1QZ3

@PatriceVignola
Copy link
Contributor

There are a 2 issues that I could notice here at a cursory glance:

  1. The model uses a Qr operator internally, which isn't supported on DML (it isn't supported on CUDA either, but they "fake" register it to run on the CPU in order to enable device colocation on CUDA). We can do the same thing that CUDA does here and register it the same way for DML, and we might see some marginal perf improvements.
  2. The fact that you only have 1% load on the GPU is worrying. On my desktop, I see at least 40% throughout the whole training process when running the script that you linked. We haven't really tested tensorflow-directml on AMD APUs yet, but our experience with many integrated graphics in the past is that it's just faster to run everything on the CPU. For integrated graphics to work, they have to be powerful enough to make it worth to transfer data between the CPU and the GPU. I'll see if I can get my hands on a 4750 and investigate more.

@onurberkay
Copy link
Author

I have try a heavy model with dense on gpu its faster then cpu. Gpu usage stats low again but I think must be a problem about stats. When will be added first change or will be added? I can make tries any time. Thanks for answers
model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001))) model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001))) model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001))) model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001))) model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001))) model.add(Dense(2000,kernel_regularizer=regularizers.l2(0.00000000001)))

@RichardErkhov
Copy link

I might be too late, but I think 89c is the problem, try to cool it down, it might be just trottling issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants