Can't get Tensorflow-directml to use GPU #147

Gostas · 2021-01-05T23:55:48Z

Gostas
Jan 5, 2021

Hello!

I am new to Tensorflow and I am currently learning about machine learning with python.

As training models using the CPU is painfully slow, I thought I'd look up how to use the GPU for training instead.

I use the Surface Book 2 with a GTX1050 and after attempting to run TF with the gpu, I noticed that TF was not detecting it. After a little bit of research, I found out that the current Microsoft drivers for the Surface Book 2's GPU do not offer support for CUDA and so I would have to download the NVIDIA drivers. However, I read about some issues arising from not using the Microsoft drivers and decided not to download them.

Fortunately, I came across tensorflow-directml, which from my understanding does not need to use CUDA and is GPU agnostic.

I checked that tensorflow-directml detects both the integrated graphics and the gpu (DML:0), however the following piece of code that I was hoping could be GPU accelerated, is not. I notice in Task Manager that although GPU utilization is 0 while running the code, for some reason almost all of the VRAM gets used. Also I do not notice any speed improvement.

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(50, activation='relu')) # relu is used for performance
model.add(tf.keras.layers.Dense(15, activation='relu'))
model.add(tf.keras.layers.Dense(NUM_GESTURES, activation='softmax'))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
history = model.fit(inputs_train, outputs_train, epochs=600, batch_size=1, validation_data=(inputs_validate, outputs_validate))

Maybe I am missing something as I am new to Tensorflow, but I do not understand why this happens. The only reason I can think of is that DirectML requires CUDA to run.

This is the output from the console:

WARNING:tensorflow:From C:\Users\konst\miniconda3\envs\arduino_ml\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 3141 samples, validate on 1048 samples
2021-01-06 01:27:14.358630: I tensorflow/stream_executor/platform/default/dso_loader.cc:98] Successfully opened dynamic library C:\Users\konst\miniconda3\envs\arduino_ml\lib\site-packages\tensorflow_core\python/directml.bdb07c797e1af1b4a42d21c67ce5494d73991459.dll
2021-01-06 01:27:14.605806: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:126] DirectML device enumeration: found 2 compatible adapters.
2021-01-06 01:27:14.608154: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-01-06 01:27:14.650299: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:109] DirectML: creating device on adapter 0 (NVIDIA GeForce GTX 1050)
2021-01-06 01:27:14.973337: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:109] DirectML: creating device on adapter 1 (Intel(R) UHD Graphics 620)

I would really appreciate any help on this!

Answered by adtsai

Jan 6, 2021

Hi, looking at the console output you provided it appears that DirectML is correctly detecting your GTX 1050 Ti (and even the integrated Intel GPU!), and I'm not seeing anything that would indicate that it isn't using the GPU. One thing to keep in mind is that if you're using Task Manager to monitor GPU usage, it can sometimes be misleading because the default Task Manager GPU usage graph looks for 3D workloads which are different from compute workloads like tensorflow-directml. So when looking at task manager, GPU usage when using tensorflow-directml may show up under the "Compute_0" or "Graphics_1" graphs or something similar, depending on your GPU.

See this comment for more information: …

View full answer

adtsai · 2021-01-06T00:49:00Z

adtsai
Jan 6, 2021

Hi, looking at the console output you provided it appears that DirectML is correctly detecting your GTX 1050 Ti (and even the integrated Intel GPU!), and I'm not seeing anything that would indicate that it isn't using the GPU. One thing to keep in mind is that if you're using Task Manager to monitor GPU usage, it can sometimes be misleading because the default Task Manager GPU usage graph looks for 3D workloads which are different from compute workloads like tensorflow-directml. So when looking at task manager, GPU usage when using tensorflow-directml may show up under the "Compute_0" or "Graphics_1" graphs or something similar, depending on your GPU.

See this comment for more information: #134 (comment)

I suspect that tensorflow-directml is indeed using your GPU, but the model is small enough that it isn't benefitting much from the hardware acceleration. The 3 dense layers in the model are quite small, and so are well-suited to running on the CPU. You would see similar things even with CUDA: the layers aren't large enough to effectively utilize the massively parallel GPU.

To test this out you could try using more layers of a larger size. For example:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(NUM_GESTURES, activation='softmax'))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
history = model.fit(inputs_train, outputs_train, epochs=600, batch_size=1, validation_data=(inputs_validate, outputs_validate))

I suspect this will run slowly on the CPU, but should be much faster if accelerated by your GPU using tensorflow-directml. For very small models, unfortunately sometimes to best option is to train them on the CPU.

0 replies

jstoecker · 2021-01-06T01:20:20Z

jstoecker
Jan 6, 2021
Maintainer

DirectML doesn't depend on CUDA, and it will work your device. I happen to have a Surface Book 2 handy and ran some tests on a similar sequential model (the canonical MNIST sample); however, I made the hidden layers much larger. As Adrian mentioned, GPUs start to shine when you give them lots of work. Most "hello world" models aren't complicated enough to show the benefit and may even be slower.

import tensorflow.compat.v1 as tf
from tensorflow.keras import layers
import numpy as np

# uncomment to force CPU
# tf.config.experimental.set_visible_devices([], "DML")

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

model = tf.keras.models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(4096,activation='relu'),
layers.Dense(4096,activation='relu'),
layers.Dense(10, activation='softmax')
])
model.summary()

model.compile(optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'],)

model.fit(np.expand_dims(x_train,3), y_train, epochs=2, batch_size=1024)

CPU:

Epoch 1/2
60000/60000 [==============================] - 42s 704us/sample - loss: 0.3395 - acc: 0.8942
Epoch 2/2
60000/60000 [==============================] - 39s 644us/sample - loss: 0.0744 - acc: 0.9777

GTX 1050:

Epoch 1/2
60000/60000 [==============================] - 8s 140us/sample - loss: 0.3351 - acc: 0.8971
Epoch 2/2
60000/60000 [==============================] - 8s 138us/sample - loss: 0.0752 - acc: 0.9773

Also, fair warning, we're only recently started to really dive into performance work.

0 replies

Gostas · 2021-01-06T17:01:40Z

Gostas
Jan 6, 2021
Author

Thanks so much for your quick replies, they've been really insightful! GPU usage was indeed showing up under Graphics_1.
I am very happy to know I can hardware accelerate ML models on my laptop that easily and looking forward to seeing what you guys can do with DirectML in the future.

Edit: The sample code I provided does run faster using the CPU only

0 replies

gurachan · 2021-11-04T14:21:54Z

gurachan
Nov 4, 2021

thanks for that i can now confirm it works in my end xD i thought its a bluff that it wont use my gpu

WARNING:tensorflow:From D:\PyConda\envs\directml\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0
_________________________________________________________________
dense (Dense)                (None, 4096)              3215360   
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              16781312
_________________________________________________________________
dense_2 (Dense)              (None, 10)                40970
=================================================================
Total params: 20,037,642
Trainable params: 20,037,642
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples
2021-11-04 22:20:29.359215: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library D:\PyConda\envs\directml\lib\site-packages\tensorflow_core\python/directml.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll
2021-11-04 22:20:29.363257: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library dxgi.dll
2021-11-04 22:20:29.367242: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library d3d12.dll
2021-11-04 22:20:29.498640: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 1 compatible adapters.
2021-11-04 22:20:29.501590: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2    
2021-11-04 22:20:29.505457: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (AMD Radeon RX 5700)
2021-11-04 22:20:29.578354: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library Kernel32.dll
Epoch 1/2
60000/60000 [==============================] - 2s 35us/sample - loss: 0.3634 - acc: 0.8881
Epoch 2/2
60000/60000 [==============================] - 2s 30us/sample - loss: 0.0794 - acc: 0.9760

0 replies

darkar18 · 2022-03-14T12:50:17Z

darkar18
Mar 14, 2022

2022-03-14 17:59:28.514754: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python/directml.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll
2022-03-14 17:59:28.516684: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library dxgi.dll
2022-03-14 17:59:28.524928: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library d3d12.dll
2022-03-14 17:59:31.621190: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 2 compatible adapters.
2022-03-14 17:59:31.622473: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2022-03-14 17:59:31.626252: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (AMD Radeon(TM) Graphics)
2022-03-14 17:59:31.757697: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library Kernel32.dll
2022-03-14 17:59:31.761843: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 1 (Radeon RX 5500M)
Epoch 1/2
60000/60000 [==============================] - 26s 436us/sample - loss: 0.3381 - acc: 0.8957
Epoch 2/2
60000/60000 [==============================] - 23s 391us/sample - loss: 0.0760 - acc: 0.9773
<tensorflow.python.keras.callbacks.History object at 0x000001EEA11F35F8>`

as you can see i guess its running on integrated graphics since taskmanager showed 0 utilization of graphics 1 RX5500m
can u tell me a way so that i can switch it to adapter 1 RX5500m dedicated graphics?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get Tensorflow-directml to use GPU #147

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Can't get Tensorflow-directml to use GPU #147

Gostas Jan 5, 2021

Replies: 5 comments

adtsai Jan 6, 2021

jstoecker Jan 6, 2021 Maintainer

Gostas Jan 6, 2021 Author

gurachan Nov 4, 2021

darkar18 Mar 14, 2022

Gostas
Jan 5, 2021

adtsai
Jan 6, 2021

jstoecker
Jan 6, 2021
Maintainer

Gostas
Jan 6, 2021
Author

gurachan
Nov 4, 2021

darkar18
Mar 14, 2022