Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue #2289

Open
paolodalberto opened this issue Oct 31, 2023 · 47 comments

Comments

@paolodalberto
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

v2.13.0-4108-g619eb25934e 2.13.0

Custom code

No

OS platform and distribution

Linux xsjfislx32 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Mobile device

No response

Python version

Python 3.9.18

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

This is the smallest piece of code from a tutorial that reproduce my problem.

root@xsjfislx32:/dockerx# python
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-10-30 23:25:50.998575: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> gpus = tf.config.list_physical_devices('GPU')
>>> if gpus:
...     print(len(gpus), "Physical GPUs")
...     try:
...         # Currently, memory growth needs to be the same across GPUs
...         for gpu in gpus:
...             print(gpu)
...             tf.config.experimental.set_memory_growth(gpu, True)
...         logical_gpus = tf.config.list_logical_devices('GPU')
...         print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
...     except RuntimeError as e:
...         # Memory growth must be set before GPUs have been initialized
...         print(e)
... 
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
>>> type(gpus[0])
<class 'tensorflow.python.eager.context.PhysicalDevice'>
>>>  logical_gpus = tf.config.list_logical_devices('GPU')
  File "<stdin>", line 1
    logical_gpus = tf.config.list_logical_devices('GPU')
IndentationError: unexpected indent
>>> logical_gpus = tf.config.list_logical_devices('GPU')
2023-10-30 23:30:11.398855: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

Standalone code to reproduce the issue

The main problem comes from reading training data (using multiple GPUs) and at first I thought was the batch size: 


print("reading training set",data_dir+"/train/")
        train_ds = tf.keras.preprocessing.image_dataset_from_directory(
            data_dir+"/train/", 
            #subset="training",
            seed = 123,
            label_mode = 'int',
            image_size=(x, y),
            #batch_size=128
            batch_size= 16
        )

Relevant log output

hipconfig
HIP version  : 5.6.31061-8c743ae5d

== hipconfig
HIP_PATH     : /scratch/rocm-5.6.0
ROCM_PATH    : /scratch/rocm-5.6.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/scratch/rocm-5.6.0/include -I/scratch/rocm-5.6.0/llvm/bin/../lib/clang/16.0.0 

== hip-clang
HIP_CLANG_PATH   : /scratch/rocm-5.6.0/llvm/bin
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratch/rocm-5.6.0/llvm/bin
AMD LLVM version 16.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver2

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -isystem "/scratch/rocm-5.6.0/include" -O3
hip-clang-ldflags  :  -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc

=== Environment Variables
PATH=/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/condabin:/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/bin:/home/paolod/bin:/usr/local/bin:/mis/TREE/bin:/usr/bin:/bin:/usr/ucb
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib

== Linux Kernel
Hostname     : xsjfislx31
Linux xsjfislx31 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:	core-11.1.0ubuntu4-noarch:printing-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

root@xsjfislx31:/root# rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          NO

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-20b160b85ec60c80               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   9984                               
  Internal Node ID:        2                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 4                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-973b4b0056b6285e               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    3                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   33536                              
  Internal Node ID:        3                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 5                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-16bf154e2fe0adac               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    4                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   58368                              
  Internal Node ID:        4                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***
@paolodalberto
Copy link
Author

feel free to reach me directly/internally ... thank you
Paolo

@dipietrantonio
Copy link

I observed the same behaviour and thought of an incompatibility between ROCm 5.6 and TF 2.13. But that was just a wild guess.

@paolodalberto
Copy link
Author

My home set up with the new tensoflow:latest docker does the same (different GPUs Radeon VII).
this is a show stopper ... any attention will be appreciated !

@paolodalberto
Copy link
Author

ls /etc/alternatives/roc -lrt 
roc-obj                roc-obj-ls             rocm/                  rocm_agent_enumerator  rocprof
roc-obj-extract        rocgdb                 rocm-smi               rocminfo               rocprofv2
:/root# ls /etc/alternatives/rocm -lrt 
lrwxrwxrwx 1 root root 15 Sep 16 23:54 /etc/alternatives/rocm -> /opt/rocm-5.7.0

@paolodalberto
Copy link
Author

drwxr-xr-x 1 root root 4096 Sep 16 23:54 rocm-5.7.0
lrwxrwxrwx 1 root root   22 Sep 16 23:54 rocm -> /etc/alternatives/rocm

r

@paolodalberto
Copy link
Author

the rocm version default seems to be 5.7 but hip is 5.6 ?

@paolodalberto
Copy link
Author

Any takers for this Issue ?

@paolodalberto
Copy link
Author

Is there any one ?

@paolodalberto
Copy link
Author

echo ... echo ... echo

@paolodalberto
Copy link
Author

shoot an email paolod AT amd.com

@gzitzlsb-it4i
Copy link

Same here. Is there any update?

@paolodalberto
Copy link
Author

@gzitzlsb-it4i no updates on my side

@paolodalberto
Copy link
Author

keeping the comments alive ...

@gzitzlsb-it4i
Copy link

I see this issue with both rocm5.7-tf2.12-dev and rocm5.7-tf2.13-dev.
Reverted now to rocm5.6-tf2.12-dev, which works well.

Maybe this is related to the change of rom 5.6->5.7?

@paolodalberto
Copy link
Author

Thanksgiving ... take your time.
@gzitzlsb-it4i , I tested it from a docker image ... tensorflow:latest should it be addressed there ?
Who knows ... one day

@jpata
Copy link

jpata commented Nov 23, 2023

I'm observing the same problem with rocm 5.7 and both tf 2.12 and tf 2.13.
It does not appear with rocm 5.6 and tf 2.12.

@paolodalberto
Copy link
Author

Anyone can redirect me to a person I can talk to ?

@paolodalberto
Copy link
Author

I guess we will wait for rocm 6

@paolodalberto
Copy link
Author

I tried to pull again, there is no new version
is there any thing I can do ?

@paolodalberto
Copy link
Author

is there a docker tensorflow for rocm 6 ?
I removed and pulled it again and it is still 5.7

@paolodalberto
Copy link
Author

Keep this alive because the last pull did not fix this
thank you and Happy Holidays !

@paolodalberto
Copy link
Author

any update ?

@paolodalberto
Copy link
Author

REPOSITORY                         TAG       IMAGE ID       CREATED        SIZE
rocm/tensorflow                    latest    a169c415feb2   2 weeks ago    37.2GB
<none>                             <none>    36781c65cb73   2 months ago   45.5GB
containers.xilinx.com/acdc/build   2.0       b66986b55092   2 months ago   6.71GB
rocm/tensorflow                    <none>    0db6c42705bf   3 months ago   31.9GB
rocm/pytorch                       latest    1cd3cad3f90f   3 months ago   52.1GB

@paolodalberto
Copy link
Author

PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-01-09 00:06:37.372844: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

@paolodalberto
Copy link
Author

(Pdb) l
215  	    try:
216  	        # Currently, memory growth needs to be the same across GPUs
217  	        for gpu in gpus:
218  	            print(gpu)
219  	            tf.config.experimental.set_memory_growth(gpu, True)
220  ->	        logical_gpus = tf.config.list_logical_devices('GPU')
221  	        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
222  	    except RuntimeError as e:
223  	        # Memory growth must be set before GPUs have been initialized
224  	        print(e)
225  	
(Pdb) n
2024-01-09 00:09:06.679489: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
> /dockerx/test_user.py(220)<module>()

@paolodalberto
Copy link
Author

I thought the latest drop would address this ....
but how can you address it if you do not acknowledge ...
the suspense.

@jpata
Copy link

jpata commented Jan 10, 2024

I also confirm that ROCM 6.0 and tensorflow 2.14 still do not work on MI250X, the same error pops up:

2024-01-10 20:25:42.726550: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue

ROCM+tensorflow is becoming badly out of date and unusable on large HPC systems that made the mistake of buying AMD MI250X.

@paolodalberto
Copy link
Author

Someday I'll wish upon a star
Wake up where the clouds are far behind me
Where trouble melts like lemon drops

@dipietrantonio
Copy link

dipietrantonio commented Jan 25, 2024

@paolodalberto @jpata what AMDGPU driver version are you trying to run the container on? On our HPC system we have a rather old one, Driver version: 5.16.9.22.20 due to an outdated ROCm 5.2.3 version present in the Cray environment. @jpata I assume you use LUMI, which should have a similar issue.

I believe no matter the container version you use, the issue is the driver on the host system.

@jpata
Copy link

jpata commented Jan 25, 2024

@dipietrantonio excellent point, thanks a lot! I confirm that LUMI HPC where I'm experiencing this issue uses 5.16.9.22.20.

@paolodalberto
Copy link
Author

paolodalberto commented Jan 31, 2024

I used my home system (VEGA VII with upgraded ubuntu) and two more advanced ones with MI100 and upgraded recently. Pythorch works

@paolodalberto
Copy link
Author

tf-docker / > bash /dockerx/test.sh 
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
> /dockerx/test_user.py(212)<module>()
-> gpus = tf.config.list_physical_devices('GPU')
(Pdb) c
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-02-06 22:30:38.546437: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

@dipietrantonio
Copy link

Dear @paolodalberto @jpata ,

We have installed a newer version of the ROCm driver (6.0.5) on a bunch of nodes for testing and now my container with ROCm 5.7 and TF 2.13 works on the code posted in the description of this issue. The error is gone :) So it is a driver issue as I expected.

$ export CIMAGE=$MYSOFTWARE/tensorflow-2.23-rocm5.7.sif
$ singularity exec $CIMAGE python3 tf_test.py 
2024-02-07 14:35:41.241861: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
1 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
2024-02-07 14:35:46.934224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 63938 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
1 Physical GPUs, 1 Logical GPUs
$ cat tf_test.py 
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(len(gpus), "Physical GPUs")
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            print(gpu)
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

@paolodalberto
Copy link
Author

@dipietrantonio
excuse me for my thickness. The driver does not come with the docker?
You are saying that rocm driver 5.7 is the problem ...

@dipietrantonio
Copy link

dipietrantonio commented Feb 7, 2024

When you run a container you rely on the host kernel, not the one installed in your container. The driver is a kernel module. You need to update the driver on the system you are running the container on (at least when you use the Singularity container engine, but I think it is the same for Docker).

The problem for me was that the driver of ROCm 5.2 was the issue. I was not expecting that even the ROCm 5.7 driver could have this issue. But as I said, the driver version 6.0.5 solved my issue.

@paolodalberto
Copy link
Author

hmm ...

@paolodalberto
Copy link
Author

Still no new docker with rocm 6

@paolodalberto
Copy link
Author

new docker arrived

usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.9/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 105, in normalize_element
    spec = type_spec_from_value(t, use_fallback=False)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 514, in type_spec_from_value
    raise TypeError("Could not build a `TypeSpec` for {} with type {}".format(
TypeError: Could not build a `TypeSpec` for ['/imagenet/train/n02102177/n02102177_9088.JPEG', '/imagenet/train/n01796340/n01796340_3887.JPEG', '/imagenet/train/n02363005/n02363005_6465.JPEG', '/imagenet/train/n02965783/n02965783_1876.JPEG', '/imagenet/train/n01734418/n01734\
418_12680.JPEG', '/imagenet/train/n02422699/n02422699_28690.JPEG',

@paolodalberto
Copy link
Author

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dockerx/test_user.py", line 268, in <module>
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 308, in image_dataset_from_directory
    dataset = paths_and_labels_to_dataset(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 350, in paths_and_labels_to_dataset
    path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 825, in from_tensor_slices
    return from_tensor_slices_op._from_tensor_slices(tensors, name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 25, in _from_tensor_slices
    return _TensorSliceDataset(tensors, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 33, in __init__
    element = structure.normalize_element(element)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 110, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/ops.py", line 696, in convert_to_tensor
    return tensor_conversion_registry.convert(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 234, in convert
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 335, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 296, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 102, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 603, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

@paolodalberto
Copy link
Author

this is with my system at home and I will check on monday on the real machine

@paolodalberto
Copy link
Author

#2289 (comment)
how do you upgrade the driver ?

@paolodalberto
Copy link
Author

I could
1978 sudo apt update
1979 wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
1980 sudo apt install ./amdgpu-install_6.1.60100-1_all.deb
1981 sudo amdgpu-install --list-usecase
1982 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,workstation,asan
1983 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,hip
1984 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,opencl,graphics,hiplibsdk,hip
1985 sudo amdgpu-install --usecase=dkms
1986 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,rocmdevtools
1987 sudo amdgpu-install --usecase=dkms,rocm
1988 sudo amdgpu-install --usecase=dkms,rocmdev, rocm
1989 sudo amdgpu-install --usecase=dkms
1990 sudo reboot

At least it works for one GPU

@paolodalberto
Copy link
Author

Let me check what I can do on my large machine ...

@paolodalberto
Copy link
Author

the large machine now kicks me out during evaluation
but I can see briefly the GPUs

@paolodalberto
Copy link
Author

yep multiple GPUs do not work (single GPU works)

@paolodalberto
Copy link
Author

In practice the multiple GPUs fails so badly that the docker application stalls the machine and breaks the docker deamon that I have to restart manually. This is on a system above ... The funny part this was working on 5.7, 6 months ago .... for tensor flow and pytorch ...
let me know if you like to connect ...

@paolodalberto
Copy link
Author

image
Good times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants