-
Hello! I am new to Tensorflow and I am currently learning about machine learning with python. As training models using the CPU is painfully slow, I thought I'd look up how to use the GPU for training instead. I use the Surface Book 2 with a GTX1050 and after attempting to run TF with the gpu, I noticed that TF was not detecting it. After a little bit of research, I found out that the current Microsoft drivers for the Surface Book 2's GPU do not offer support for CUDA and so I would have to download the NVIDIA drivers. However, I read about some issues arising from not using the Microsoft drivers and decided not to download them. Fortunately, I came across tensorflow-directml, which from my understanding does not need to use CUDA and is GPU agnostic. I checked that tensorflow-directml detects both the integrated graphics and the gpu (DML:0), however the following piece of code that I was hoping could be GPU accelerated, is not. I notice in Task Manager that although GPU utilization is 0 while running the code, for some reason almost all of the VRAM gets used. Also I do not notice any speed improvement.
Maybe I am missing something as I am new to Tensorflow, but I do not understand why this happens. The only reason I can think of is that DirectML requires CUDA to run. This is the output from the console:
I would really appreciate any help on this! |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Hi, looking at the console output you provided it appears that DirectML is correctly detecting your GTX 1050 Ti (and even the integrated Intel GPU!), and I'm not seeing anything that would indicate that it isn't using the GPU. One thing to keep in mind is that if you're using Task Manager to monitor GPU usage, it can sometimes be misleading because the default Task Manager GPU usage graph looks for 3D workloads which are different from compute workloads like tensorflow-directml. So when looking at task manager, GPU usage when using tensorflow-directml may show up under the "Compute_0" or "Graphics_1" graphs or something similar, depending on your GPU. See this comment for more information: #134 (comment) I suspect that tensorflow-directml is indeed using your GPU, but the model is small enough that it isn't benefitting much from the hardware acceleration. The 3 dense layers in the model are quite small, and so are well-suited to running on the CPU. You would see similar things even with CUDA: the layers aren't large enough to effectively utilize the massively parallel GPU. To test this out you could try using more layers of a larger size. For example:
I suspect this will run slowly on the CPU, but should be much faster if accelerated by your GPU using tensorflow-directml. For very small models, unfortunately sometimes to best option is to train them on the CPU. |
Beta Was this translation helpful? Give feedback.
-
DirectML doesn't depend on CUDA, and it will work your device. I happen to have a Surface Book 2 handy and ran some tests on a similar sequential model (the canonical MNIST sample); however, I made the hidden layers much larger. As Adrian mentioned, GPUs start to shine when you give them lots of work. Most "hello world" models aren't complicated enough to show the benefit and may even be slower. import tensorflow.compat.v1 as tf
from tensorflow.keras import layers
import numpy as np
# uncomment to force CPU
# tf.config.experimental.set_visible_devices([], "DML")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
model = tf.keras.models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(4096,activation='relu'),
layers.Dense(4096,activation='relu'),
layers.Dense(10, activation='softmax')
])
model.summary()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],)
model.fit(np.expand_dims(x_train,3), y_train, epochs=2, batch_size=1024) CPU:
GTX 1050:
Also, fair warning, we're only recently started to really dive into performance work. |
Beta Was this translation helpful? Give feedback.
-
Thanks so much for your quick replies, they've been really insightful! GPU usage was indeed showing up under Graphics_1. Edit: The sample code I provided does run faster using the CPU only |
Beta Was this translation helpful? Give feedback.
-
thanks for that i can now confirm it works in my end xD i thought its a bluff that it wont use my gpu WARNING:tensorflow:From D:\PyConda\envs\directml\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 4096) 3215360
_________________________________________________________________
dense_1 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_2 (Dense) (None, 10) 40970
=================================================================
Total params: 20,037,642
Trainable params: 20,037,642
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples
2021-11-04 22:20:29.359215: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library D:\PyConda\envs\directml\lib\site-packages\tensorflow_core\python/directml.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll
2021-11-04 22:20:29.363257: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library dxgi.dll
2021-11-04 22:20:29.367242: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library d3d12.dll
2021-11-04 22:20:29.498640: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 1 compatible adapters.
2021-11-04 22:20:29.501590: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-11-04 22:20:29.505457: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (AMD Radeon RX 5700)
2021-11-04 22:20:29.578354: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library Kernel32.dll
Epoch 1/2
60000/60000 [==============================] - 2s 35us/sample - loss: 0.3634 - acc: 0.8881
Epoch 2/2
60000/60000 [==============================] - 2s 30us/sample - loss: 0.0794 - acc: 0.9760 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Hi, looking at the console output you provided it appears that DirectML is correctly detecting your GTX 1050 Ti (and even the integrated Intel GPU!), and I'm not seeing anything that would indicate that it isn't using the GPU. One thing to keep in mind is that if you're using Task Manager to monitor GPU usage, it can sometimes be misleading because the default Task Manager GPU usage graph looks for 3D workloads which are different from compute workloads like tensorflow-directml. So when looking at task manager, GPU usage when using tensorflow-directml may show up under the "Compute_0" or "Graphics_1" graphs or something similar, depending on your GPU.
See this comment for more information: …