2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue #2289

paolodalberto · 2023-10-31T16:10:40Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

v2.13.0-4108-g619eb25934e 2.13.0

Custom code

No

OS platform and distribution

Linux xsjfislx32 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Mobile device

No response

Python version

Python 3.9.18

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

This is the smallest piece of code from a tutorial that reproduce my problem.

root@xsjfislx32:/dockerx# python
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-10-30 23:25:50.998575: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> gpus = tf.config.list_physical_devices('GPU')
>>> if gpus:
...     print(len(gpus), "Physical GPUs")
...     try:
...         # Currently, memory growth needs to be the same across GPUs
...         for gpu in gpus:
...             print(gpu)
...             tf.config.experimental.set_memory_growth(gpu, True)
...         logical_gpus = tf.config.list_logical_devices('GPU')
...         print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
...     except RuntimeError as e:
...         # Memory growth must be set before GPUs have been initialized
...         print(e)
... 
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
>>> type(gpus[0])
<class 'tensorflow.python.eager.context.PhysicalDevice'>
>>>  logical_gpus = tf.config.list_logical_devices('GPU')
  File "<stdin>", line 1
    logical_gpus = tf.config.list_logical_devices('GPU')
IndentationError: unexpected indent
>>> logical_gpus = tf.config.list_logical_devices('GPU')
2023-10-30 23:30:11.398855: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

Standalone code to reproduce the issue

The main problem comes from reading training data (using multiple GPUs) and at first I thought was the batch size: 


print("reading training set",data_dir+"/train/")
        train_ds = tf.keras.preprocessing.image_dataset_from_directory(
            data_dir+"/train/", 
            #subset="training",
            seed = 123,
            label_mode = 'int',
            image_size=(x, y),
            #batch_size=128
            batch_size= 16
        )

Relevant log output

hipconfig
HIP version  : 5.6.31061-8c743ae5d

== hipconfig
HIP_PATH     : /scratch/rocm-5.6.0
ROCM_PATH    : /scratch/rocm-5.6.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/scratch/rocm-5.6.0/include -I/scratch/rocm-5.6.0/llvm/bin/../lib/clang/16.0.0 

== hip-clang
HIP_CLANG_PATH   : /scratch/rocm-5.6.0/llvm/bin
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratch/rocm-5.6.0/llvm/bin
AMD LLVM version 16.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver2

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -isystem "/scratch/rocm-5.6.0/include" -O3
hip-clang-ldflags  :  -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc

=== Environment Variables
PATH=/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/condabin:/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/bin:/home/paolod/bin:/usr/local/bin:/mis/TREE/bin:/usr/bin:/bin:/usr/ucb
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib

== Linux Kernel
Hostname     : xsjfislx31
Linux xsjfislx31 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:	core-11.1.0ubuntu4-noarch:printing-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

root@xsjfislx31:/root# rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          NO

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-20b160b85ec60c80               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   9984                               
  Internal Node ID:        2                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 4                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-973b4b0056b6285e               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    3                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   33536                              
  Internal Node ID:        3                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 5                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-16bf154e2fe0adac               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    4                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   58368                              
  Internal Node ID:        4                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

The text was updated successfully, but these errors were encountered:

paolodalberto · 2023-10-31T16:11:24Z

feel free to reach me directly/internally ... thank you
Paolo

dipietrantonio · 2023-11-01T09:27:13Z

I observed the same behaviour and thought of an incompatibility between ROCm 5.6 and TF 2.13. But that was just a wild guess.

paolodalberto · 2023-11-01T17:20:54Z

My home set up with the new tensoflow:latest docker does the same (different GPUs Radeon VII).
this is a show stopper ... any attention will be appreciated !

paolodalberto · 2023-11-02T22:01:05Z

ls /etc/alternatives/roc -lrt 
roc-obj                roc-obj-ls             rocm/                  rocm_agent_enumerator  rocprof
roc-obj-extract        rocgdb                 rocm-smi               rocminfo               rocprofv2
:/root# ls /etc/alternatives/rocm -lrt 
lrwxrwxrwx 1 root root 15 Sep 16 23:54 /etc/alternatives/rocm -> /opt/rocm-5.7.0

paolodalberto · 2023-11-02T22:01:51Z

drwxr-xr-x 1 root root 4096 Sep 16 23:54 rocm-5.7.0
lrwxrwxrwx 1 root root   22 Sep 16 23:54 rocm -> /etc/alternatives/rocm

r

paolodalberto · 2023-11-02T22:02:16Z

the rocm version default seems to be 5.7 but hip is 5.6 ?

paolodalberto · 2023-11-04T17:15:11Z

Any takers for this Issue ?

paolodalberto · 2023-11-06T23:59:07Z

Is there any one ?

paolodalberto · 2023-11-08T19:26:43Z

echo ... echo ... echo

paolodalberto · 2023-11-08T22:56:06Z

shoot an email paolod AT amd.com

gzitzlsb-it4i · 2023-11-14T15:04:52Z

Same here. Is there any update?

paolodalberto · 2023-11-15T22:13:28Z

@gzitzlsb-it4i no updates on my side

paolodalberto · 2023-11-16T19:21:38Z

keeping the comments alive ...

gzitzlsb-it4i · 2023-11-21T13:05:55Z

I see this issue with both rocm5.7-tf2.12-dev and rocm5.7-tf2.13-dev.
Reverted now to rocm5.6-tf2.12-dev, which works well.

Maybe this is related to the change of rom 5.6->5.7?

paolodalberto · 2023-11-21T22:23:34Z

Thanksgiving ... take your time.
@gzitzlsb-it4i , I tested it from a docker image ... tensorflow:latest should it be addressed there ?
Who knows ... one day

jpata · 2023-11-23T13:50:21Z

I'm observing the same problem with rocm 5.7 and both tf 2.12 and tf 2.13.
It does not appear with rocm 5.6 and tf 2.12.

paolodalberto · 2023-11-27T05:56:21Z

Anyone can redirect me to a person I can talk to ?

paolodalberto · 2023-12-06T18:28:50Z

I guess we will wait for rocm 6

paolodalberto · 2023-12-19T02:21:57Z

I tried to pull again, there is no new version
is there any thing I can do ?

paolodalberto · 2023-12-20T01:28:24Z

is there a docker tensorflow for rocm 6 ?
I removed and pulled it again and it is still 5.7

paolodalberto · 2023-12-24T02:12:47Z

Keep this alive because the last pull did not fix this
thank you and Happy Holidays !

paolodalberto · 2024-01-06T01:39:46Z

any update ?

paolodalberto · 2024-01-09T00:07:35Z

REPOSITORY                         TAG       IMAGE ID       CREATED        SIZE
rocm/tensorflow                    latest    a169c415feb2   2 weeks ago    37.2GB
<none>                             <none>    36781c65cb73   2 months ago   45.5GB
containers.xilinx.com/acdc/build   2.0       b66986b55092   2 months ago   6.71GB
rocm/tensorflow                    <none>    0db6c42705bf   3 months ago   31.9GB
rocm/pytorch                       latest    1cd3cad3f90f   3 months ago   52.1GB

paolodalberto · 2024-01-09T00:08:17Z

PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-01-09 00:06:37.372844: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

paolodalberto · 2024-01-09T00:10:42Z

(Pdb) l
215  	    try:
216  	        # Currently, memory growth needs to be the same across GPUs
217  	        for gpu in gpus:
218  	            print(gpu)
219  	            tf.config.experimental.set_memory_growth(gpu, True)
220  ->	        logical_gpus = tf.config.list_logical_devices('GPU')
221  	        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
222  	    except RuntimeError as e:
223  	        # Memory growth must be set before GPUs have been initialized
224  	        print(e)
225  	
(Pdb) n
2024-01-09 00:09:06.679489: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
> /dockerx/test_user.py(220)<module>()

paolodalberto · 2024-01-09T00:12:57Z

I thought the latest drop would address this ....
but how can you address it if you do not acknowledge ...
the suspense.

jpata · 2024-01-10T18:30:47Z

I also confirm that ROCM 6.0 and tensorflow 2.14 still do not work on MI250X, the same error pops up:

2024-01-10 20:25:42.726550: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue

ROCM+tensorflow is becoming badly out of date and unusable on large HPC systems that made the mistake of buying AMD MI250X.

paolodalberto · 2024-01-21T01:45:09Z

Someday I'll wish upon a star
Wake up where the clouds are far behind me
Where trouble melts like lemon drops

dipietrantonio · 2024-01-25T08:00:28Z

@paolodalberto @jpata what AMDGPU driver version are you trying to run the container on? On our HPC system we have a rather old one, Driver version: 5.16.9.22.20 due to an outdated ROCm 5.2.3 version present in the Cray environment. @jpata I assume you use LUMI, which should have a similar issue.

I believe no matter the container version you use, the issue is the driver on the host system.

jpata · 2024-01-25T08:10:40Z

@dipietrantonio excellent point, thanks a lot! I confirm that LUMI HPC where I'm experiencing this issue uses 5.16.9.22.20.

paolodalberto · 2024-01-31T01:25:41Z

I used my home system (VEGA VII with upgraded ubuntu) and two more advanced ones with MI100 and upgraded recently. Pythorch works

paolodalberto · 2024-02-06T22:33:08Z

tf-docker / > bash /dockerx/test.sh 
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
> /dockerx/test_user.py(212)<module>()
-> gpus = tf.config.list_physical_devices('GPU')
(Pdb) c
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-02-06 22:30:38.546437: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

dipietrantonio · 2024-02-07T06:39:30Z

Dear @paolodalberto @jpata ,

We have installed a newer version of the ROCm driver (6.0.5) on a bunch of nodes for testing and now my container with ROCm 5.7 and TF 2.13 works on the code posted in the description of this issue. The error is gone :) So it is a driver issue as I expected.

$ export CIMAGE=$MYSOFTWARE/tensorflow-2.23-rocm5.7.sif
$ singularity exec $CIMAGE python3 tf_test.py 
2024-02-07 14:35:41.241861: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
1 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
2024-02-07 14:35:46.934224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 63938 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
1 Physical GPUs, 1 Logical GPUs
$ cat tf_test.py 
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(len(gpus), "Physical GPUs")
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            print(gpu)
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

paolodalberto · 2024-02-07T07:07:18Z

@dipietrantonio
excuse me for my thickness. The driver does not come with the docker?
You are saying that rocm driver 5.7 is the problem ...

dipietrantonio · 2024-02-07T07:19:31Z

When you run a container you rely on the host kernel, not the one installed in your container. The driver is a kernel module. You need to update the driver on the system you are running the container on (at least when you use the Singularity container engine, but I think it is the same for Docker).

The problem for me was that the driver of ROCm 5.2 was the issue. I was not expecting that even the ROCm 5.7 driver could have this issue. But as I said, the driver version 6.0.5 solved my issue.

paolodalberto · 2024-02-15T18:30:48Z

hmm ...

paolodalberto · 2024-03-25T17:58:23Z

Still no new docker with rocm 6

paolodalberto · 2024-05-06T01:15:22Z

new docker arrived

usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.9/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 105, in normalize_element
    spec = type_spec_from_value(t, use_fallback=False)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 514, in type_spec_from_value
    raise TypeError("Could not build a `TypeSpec` for {} with type {}".format(
TypeError: Could not build a `TypeSpec` for ['/imagenet/train/n02102177/n02102177_9088.JPEG', '/imagenet/train/n01796340/n01796340_3887.JPEG', '/imagenet/train/n02363005/n02363005_6465.JPEG', '/imagenet/train/n02965783/n02965783_1876.JPEG', '/imagenet/train/n01734418/n01734\
418_12680.JPEG', '/imagenet/train/n02422699/n02422699_28690.JPEG',

paolodalberto · 2024-05-06T01:18:14Z

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dockerx/test_user.py", line 268, in <module>
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 308, in image_dataset_from_directory
    dataset = paths_and_labels_to_dataset(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 350, in paths_and_labels_to_dataset
    path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 825, in from_tensor_slices
    return from_tensor_slices_op._from_tensor_slices(tensors, name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 25, in _from_tensor_slices
    return _TensorSliceDataset(tensors, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 33, in __init__
    element = structure.normalize_element(element)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 110, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/ops.py", line 696, in convert_to_tensor
    return tensor_conversion_registry.convert(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 234, in convert
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 335, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 296, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 102, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 603, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

paolodalberto · 2024-05-06T01:23:02Z

this is with my system at home and I will check on monday on the real machine

paolodalberto · 2024-05-06T15:15:02Z

#2289 (comment)
how do you upgrade the driver ?

paolodalberto · 2024-05-06T16:26:02Z

I could
1978 sudo apt update
1979 wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
1980 sudo apt install ./amdgpu-install_6.1.60100-1_all.deb
1981 sudo amdgpu-install --list-usecase
1982 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,workstation,asan
1983 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,hip
1984 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,opencl,graphics,hiplibsdk,hip
1985 sudo amdgpu-install --usecase=dkms
1986 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,rocmdevtools
1987 sudo amdgpu-install --usecase=dkms,rocm
1988 sudo amdgpu-install --usecase=dkms,rocmdev, rocm
1989 sudo amdgpu-install --usecase=dkms
1990 sudo reboot

At least it works for one GPU

paolodalberto · 2024-05-06T16:45:24Z

Let me check what I can do on my large machine ...

paolodalberto · 2024-05-08T01:21:05Z

the large machine now kicks me out during evaluation
but I can see briefly the GPUs

paolodalberto · 2024-05-08T04:53:08Z

yep multiple GPUs do not work (single GPU works)

paolodalberto · 2024-05-08T15:32:52Z

In practice the multiple GPUs fails so badly that the docker application stalls the machine and breaks the docker deamon that I have to restart manually. This is on a system above ... The funny part this was working on 5.7, 6 months ago .... for tensor flow and pytorch ...
let me know if you like to connect ...

paolodalberto · 2024-05-10T15:19:00Z

Good times

jpata mentioned this issue Mar 22, 2024

[WIP] Training on AMD / ROCm jpata/particleflow#302

Closed

2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue #2289

2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue #2289

Comments

paolodalberto commented Oct 31, 2023

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

paolodalberto commented Oct 31, 2023

dipietrantonio commented Nov 1, 2023

paolodalberto commented Nov 1, 2023

paolodalberto commented Nov 2, 2023

paolodalberto commented Nov 2, 2023

paolodalberto commented Nov 2, 2023

paolodalberto commented Nov 4, 2023

paolodalberto commented Nov 6, 2023

paolodalberto commented Nov 8, 2023

paolodalberto commented Nov 8, 2023

gzitzlsb-it4i commented Nov 14, 2023

paolodalberto commented Nov 15, 2023

paolodalberto commented Nov 16, 2023

gzitzlsb-it4i commented Nov 21, 2023

paolodalberto commented Nov 21, 2023

jpata commented Nov 23, 2023

paolodalberto commented Nov 27, 2023

paolodalberto commented Dec 6, 2023

paolodalberto commented Dec 19, 2023

paolodalberto commented Dec 20, 2023

paolodalberto commented Dec 24, 2023

paolodalberto commented Jan 6, 2024

paolodalberto commented Jan 9, 2024

paolodalberto commented Jan 9, 2024

paolodalberto commented Jan 9, 2024

paolodalberto commented Jan 9, 2024

jpata commented Jan 10, 2024 • edited Loading

paolodalberto commented Jan 21, 2024

dipietrantonio commented Jan 25, 2024 • edited Loading

jpata commented Jan 25, 2024 • edited Loading

paolodalberto commented Jan 31, 2024 • edited Loading

paolodalberto commented Feb 6, 2024

dipietrantonio commented Feb 7, 2024

paolodalberto commented Feb 7, 2024

dipietrantonio commented Feb 7, 2024 • edited Loading

paolodalberto commented Feb 15, 2024

paolodalberto commented Mar 25, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 6, 2024

paolodalberto commented May 8, 2024

paolodalberto commented May 8, 2024

paolodalberto commented May 8, 2024

paolodalberto commented May 10, 2024

jpata commented Jan 10, 2024 •

edited

Loading

dipietrantonio commented Jan 25, 2024 •

edited

Loading

jpata commented Jan 25, 2024 •

edited

Loading

paolodalberto commented Jan 31, 2024 •

edited

Loading

dipietrantonio commented Feb 7, 2024 •

edited

Loading