We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using:
OMNITRACE_CONFIG_FILE = OMNITRACE_USE_PERFETTO = true OMNITRACE_USE_TIMEMORY = false OMNITRACE_USE_SAMPLING = false OMNITRACE_USE_PROCESS_SAMPLING = false OMNITRACE_USE_ROCTRACER = true OMNITRACE_USE_ROCM_SMI = true OMNITRACE_USE_KOKKOSP = false OMNITRACE_USE_PID = true OMNITRACE_USE_RCCLP = false OMNITRACE_USE_ROCPROFILER = true OMNITRACE_USE_ROCTX = false OMNITRACE_OUTPUT_PATH = omnitrace-%tag%-output OMNITRACE_OUTPUT_PREFIX = OMNITRACE_CRITICAL_TRACE = false OMNITRACE_PAPI_EVENTS = PAPI_TOT_CYC OMNITRACE_PERFETTO_BACKEND = inprocess OMNITRACE_PERFETTO_BUFFER_SIZE_KB = 1024000 OMNITRACE_PERFETTO_FILL_POLICY = discard OMNITRACE_PROCESS_SAMPLING_DURATION = -1 OMNITRACE_PROCESS_SAMPLING_FREQ = 0 OMNITRACE_ROCM_EVENTS = GRBM_GUI_ACTIVE OMNITRACE_SAMPLING_CPUS = all OMNITRACE_SAMPLING_DELAY = 0.5 OMNITRACE_SAMPLING_DURATION = 0 OMNITRACE_SAMPLING_FREQ = 200 OMNITRACE_SAMPLING_GPUS = 0,1 OMNITRACE_TIME_OUTPUT = true OMNITRACE_TIMEMORY_COMPONENTS = wall_clock OMNITRACE_VERBOSE = 0 OMNITRACE_ENABLED = true OMNITRACE_SUPPRESS_CONFIG = false OMNITRACE_SUPPRESS_PARSING = false
hangs on the first kernel call:
$ AMD_LOG_LEVEL=3 /home/nicurtis/lammps_benchmarking/install/tpl/openmpi/bin/mpirun --mca pml ucx --mca btl ^vader,tcp,openib,uct -np 1 ./lmp -k on g 1 -sf kk -pk kokkos cuda/aware on neigh half neigh/qeq full newton on -v x 6 -v y 6 -v z 8 -v steps 25 -in in.reaxc.hns -nocite -log TheraC63/reaxff//log.lammps [omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace ______ .___ ___. .__ __. __ .___________..______ ___ ______ _______ / __ \ | \/ | | \ | | | | | || _ \ / \ / || ____| | | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__ | | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __| | `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____ \______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______| [066.998] perfetto.cc:55910 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: "" [omnitrace][pid=30219] MPI rank: 0 (0), MPI size: 1 (1) LAMMPS (23 Jun 2022 - Update 1) KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:105) will use up to 1 GPU(s) per node :3:rocdevice.cpp :416 : 81067696131 us: 30219: [tid:0x7f68d9031280] Initializing HSA stack. :3:comgrctx.cpp :33 : 81067696207 us: 30219: [tid:0x7f68d9031280] Loading COMGR library. :3:rocdevice.cpp :207 : 81067696378 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5df3880 :3:rocdevice.cpp :1611: 81067696802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067697588 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e30cb0 :3:rocdevice.cpp :1611: 81067697802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067698438 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e6e3d0 :3:rocdevice.cpp :1611: 81067698628 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067699255 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5eabad0 :3:rocdevice.cpp :1611: 81067699441 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067700248 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5ee91e0 :3:rocdevice.cpp :1611: 81067700432 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067701884 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f26930 :3:rocdevice.cpp :1611: 81067702074 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067703320 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f64010 :3:rocdevice.cpp :1611: 81067703500 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067704752 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5fa1710 :3:rocdevice.cpp :1611: 81067704929 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:hip_context.cpp :50 : 81067706380 us: 30219: [tid:0x7f68d9031280] Direct Dispatch: 1 :3:hip_device_runtime.cpp :517 : 81067708010 us: 30219: [tid:0x7f68d9031280] hipGetDeviceCount: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708019 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c2e0, 0 ) :3:hip_device.cpp :348 : 81067708219 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708237 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c5f8, 1 ) :3:hip_device.cpp :348 : 81067708254 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708258 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c910, 2 ) :3:hip_device.cpp :348 : 81067708286 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708298 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cc28, 3 ) :3:hip_device.cpp :348 : 81067708312 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708316 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cf40, 4 ) :3:hip_device.cpp :348 : 81067708329 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708333 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d258, 5 ) :3:hip_device.cpp :348 : 81067708356 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708367 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d570, 6 ) :3:hip_device.cpp :348 : 81067708380 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708385 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d888, 7 ) :3:hip_device.cpp :348 : 81067708395 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device_runtime.cpp :530 : 81067708403 us: 30219: [tid:0x7f68d9031280] hipSetDevice ( 0 ) :3:hip_device_runtime.cpp :535 : 81067708424 us: 30219: [tid:0x7f68d9031280] hipSetDevice: Returned hipSuccess : :3:hip_memory.cpp :493 : 81067708445 us: 30219: [tid:0x7f68d9031280] hipMalloc ( 0x7fff288c3f20, 8448 ) :3:rocdevice.cpp :2093: 81067708474 us: 30219: [tid:0x7f68d9031280] device=0x653dda0, freeMem_ = 0xfeffdf00 :3:hip_memory.cpp :495 : 81067708478 us: 30219: [tid:0x7f68d9031280] hipMalloc: Returned hipSuccess : 0x7f6051b00000: duration: 33 us :3:hip_memory.cpp :1225: 81067708487 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync ( 0x7f6051b00000, 0x7fff288c40c0, 256, hipMemcpyDefault, stream:<null> ) :3:rocdevice.cpp :2686: 81067708503 us: 30219: [tid:0x7f68d9031280] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4 :3:rocdevice.cpp :2757: 81067721343 us: 30219: [tid:0x7f68d9031280] created hardware queue 0x7f68680ca000 with size 4096 with priority 1, cooperative: 0 :3:devprogram.cpp :2675: 81067924077 us: 30219: [tid:0x7f68d9031280] Using Code Object V4. :3:devprogram.cpp :2978: 81067925217 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillImage :3:devprogram.cpp :2978: 81067925223 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D :3:devprogram.cpp :2978: 81067925225 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned :3:devprogram.cpp :2978: 81067925227 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA :3:devprogram.cpp :2978: 81067925228 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned :3:devprogram.cpp :2978: 81067925229 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait :3:devprogram.cpp :2978: 81067925230 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer :3:devprogram.cpp :2978: 81067925232 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite :3:devprogram.cpp :2978: 81067925233 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned :3:devprogram.cpp :2978: 81067925234 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_gwsInit :3:devprogram.cpp :2978: 81067925236 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect :3:devprogram.cpp :2978: 81067925237 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer :3:devprogram.cpp :2978: 81067925238 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage :3:devprogram.cpp :2978: 81067925239 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage :3:rocvirtual.hpp :62 : 81067925542 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d180) for 100000 ns :3:rocvirtual.cpp :143 : 81067925558 us: 30219: [tid:0x7f68d9031280] Signal = (0x7f686811d180), start = 81067925545769, end = 81067925547369 :3:hip_memory.cpp :1226: 81067925567 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync: Returned hipSuccess : : duration: 217080 us :3:hip_stream.cpp :450 : 81067925582 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize ( stream:<null> ) :3:rocdevice.cpp :2636: 81067925599 us: 30219: [tid:0x7f68d9031280] No HW event :3:hip_stream.cpp :451 : 81067925601 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize: Returned hipSuccess : :3:hip_memory.cpp :2461: 81067925613 us: 30219: [tid:0x7f68d9031280] hipMemset ( 0x7f6051b00100, 0, 8192 ) :3:rocvirtual.cpp :679 : 81067925626 us: 30219: [tid:0x7f68d9031280] Arg3: ulong* bufULong = ptr:0x7f6051b00000 obj:[0x7f6051b00000-0x7f6051b02100] :3:rocvirtual.cpp :679 : 81067925628 us: 30219: [tid:0x7f68d9031280] Arg4: uchar* pattern = ptr:0x7f686807c080 obj:[0x7f686807c000-0x7f686807d000] :3:rocvirtual.cpp :753 : 81067925630 us: 30219: [tid:0x7f68d9031280] Arg5: uint patternSize = val:1 :3:rocvirtual.cpp :753 : 81067925631 us: 30219: [tid:0x7f68d9031280] Arg6: ulong offset = val:32 :3:rocvirtual.cpp :753 : 81067925633 us: 30219: [tid:0x7f68d9031280] Arg7: ulong size = val:1024 :3:rocvirtual.cpp :2723: 81067925634 us: 30219: [tid:0x7f68d9031280] ShaderName : __amd_rocclr_fillBufferAligned :3:rocvirtual.hpp :62 : 81067935725 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d080) for -1 ns # hangs here forever
The text was updated successfully, but these errors were encountered:
On 472e96a
Sorry, something went wrong.
No hang w/ OMNITRACE_PAPI_EVENTS, but it doesn't show in the trace either.
Hi @skyreflectedinmirrors, is this ticket still relevant? Thanks!
No branches or pull requests
Using:
hangs on the first kernel call:
The text was updated successfully, but these errors were encountered: