You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am training on GPU and the loss seems to suddenly shoot up when reaching to certain steps and it then never decreases. However, if I train it only on CPU only, the training is perfectly fine with nicely decreasing loss values. I have repeated this several times both on GPU & CPU and surprisingly the loss always shoots at the same step no ~800.
My hardware specs are:
Windows 10 (64-bit)
GPU: Nvidia GTX 1050 - 4GB; Also tried on Nvidia GTX 1060 with max-Q design - 6GB
RAM: 16 GB
Processor: i7 7th Generation
Tensorflow: tensorflow-gpu==1.8.0
Following are the loss graphs:
With GPU
With CPU
I have tried:
Tried 5 times. Let it train for 24 hours on GPU. Loss never decreases after step 800, keeps oscillating around 3.4.
I have train the same on CPU for 3 days; around 64k steps, I am able to see good results. It somehow just the GPU that doesn't work.
Normalizing training batch by reshuffling data.
Reinstall tensorflow
All other models like object detection (rcnn resnet, mobilenet etc), classification (inception-v3 etc) and many more run perfectly on GPU. I am facing problem in attention-ocr only.
It is something to do with OS. Is this code compatible with linux only as far as GPU is concerned?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I am training on GPU and the loss seems to suddenly shoot up when reaching to certain steps and it then never decreases. However, if I train it only on CPU only, the training is perfectly fine with nicely decreasing loss values. I have repeated this several times both on GPU & CPU and surprisingly the loss always shoots at the same step no ~800.
My hardware specs are:
Windows 10 (64-bit)
GPU: Nvidia GTX 1050 - 4GB; Also tried on Nvidia GTX 1060 with max-Q design - 6GB
RAM: 16 GB
Processor: i7 7th Generation
Tensorflow: tensorflow-gpu==1.8.0
Following are the loss graphs:
With GPU
With CPU
I have tried:
It is something to do with OS. Is this code compatible with linux only as far as GPU is concerned?
Thanks!
The text was updated successfully, but these errors were encountered: