retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

stbailey001 · 2024-10-02T15:29:36Z

Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --gpu_name=l4 --docker_cache=no --quiet --test_query_count=500

Fails execution of
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)

Full error:
CMD: make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'

INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'
[2024-10-02 15:10:05,960 main.py:229 INFO] Detected system ID: KnownSystem.e1ef67ab5fc2
[2024-10-02 15:10:06,139 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2024-10-02 15:10:06,139 generate_conf_files.py:107 INFO] Generated measurements/ entries for e1ef67ab5fc2_TRT/retinanet/Offline
[2024-10-02 15:10:06,140 init.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms
[2024-10-02 15:10:06,140 init.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/data
gpu_batch_size : 2
input_dtype : int8
input_format : linear
log_dir : /root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/build/logs/2024.10.02-15.10.04
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
offline_expected_qps : 1.0
precision : int8
preprocessed_data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9J14 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=1, threads_per_core=1): 64}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=292.215448, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=292215448000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA L4', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=22.494140625, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=24152899584), max_power_limit=72.0, pci_id='0x27B810DE', compute_sm=89): 1})), numa_conf=None, system_id='e1ef67ab5fc2')
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
test_mode : PerformanceOnly
use_graphs : False
user_conf_path : /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
system_id : e1ef67ab5fc2
config_name : e1ef67ab5fc2_retinanet_Offline
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
inference_server : lwis
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
[I] user.conf path: /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 73 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 126, GPU 473 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 128, GPU 483 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 55, GPU 485 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 55, GPU 493 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 0, GPU 1596 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 56, GPU 2029 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 56, GPU 2039 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 3124 (MiB)
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
F1002 15:10:07.041591 180493 lwis.cpp:245] Check failed: context->setOptimizationProfile(profileIdx) == true (0 vs. 1)
*** Check failure stack trace: ***
@ 0x7fe94f4401c3 google::LogMessage::Fail()
@ 0x7fe94f44525b google::LogMessage::SendToLog()
@ 0x7fe94f43febf google::LogMessage::Flush()
@ 0x7fe94f4406ef google::LogMessageFatal::~LogMessageFatal()
@ 0x55918ac33adc lwis::Device::Setup()
@ 0x55918ac35cab lwis::Server::Setup()
@ 0x55918ab91a00 doInference()
@ 0x55918ab8f2b0 main
@ 0x7fe93d00e083 __libc_start_main
@ 0x55918ab8f83e _start
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 231, in
main(main_args, DETECTED_SYSTEM)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 144, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
handler.run()
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 193, in handle_failure
raise RuntimeError("Run harness failed!")
RuntimeError: Run harness failed!
Traceback (most recent call last):
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 161, in handle
result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/harness.py", line 352, in run_harness
output = run_command(self.construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/init.py", line 67, in run_command
raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms' returned non-zero exit status 134.
make: *** [Makefile:45: run_harness] Error 1
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/customize.py
INFO:root: * cm run script "save mlperf inference state"
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/customize.py
INFO:root:* cm run script "get mlperf sut description"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: * cm run script "detect cpu"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py
INFO:root: * cm run script "get python3"
INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get compiler"
INFO:root: ! load /root/CM/repos/local/cache/30d4c7085bc24d5c/cm-cached-state.json
INFO:root: * cm run script "get cuda-devices _with-pycuda"
INFO:root: * cm run script "get cuda _toolkit"
INFO:root: ! load /root/CM/repos/local/cache/137abe42c97c44f6/cm-cached-state.json
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
INFO:root:ENV[CM_CUDA_VERSION]: 12.2
INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu122
INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda/bin/nvcc
INFO:root:ENV[CUDA_HOME]: /usr/local/cuda
INFO:root: * cm run script "get python3"
INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get generic-python-lib _package.pycuda"
INFO:root: ! load /root/CM/repos/local/cache/457a72dc0cd941fc/cm-cached-state.json
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.sh from tmp-run.sh
GPU 0:
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py
INFO:root: * cm run script "get generic-python-lib _package.dmiparser"
INFO:root: ! load /root/CM/repos/local/cache/525f77d4ad5a4f72/cm-cached-state.json
INFO:root: * cm run script "get cache dir _name.mlperf-inference-sut-descriptions"
INFO:root: ! load /root/CM/repos/local/cache/3d93e38d01d7494d/cm-cached-state.json
Generating SUT description file for e1ef67ab5fc2-tensorrt
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-sut-description/customize.py
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py

arjunsuresh · 2024-10-04T08:29:43Z

Hi @stbailey001 Does a retry help here with --docker_cache=no? This is an L4 GPU right?

stbailey001 · 2024-10-04T13:17:12Z

reran with --docker_cache=no still fails with the core. Yes this is an L4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

stbailey001 commented Oct 2, 2024

arjunsuresh commented Oct 4, 2024

stbailey001 commented Oct 4, 2024 •

edited

Loading

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

Comments

stbailey001 commented Oct 2, 2024

arjunsuresh commented Oct 4, 2024

stbailey001 commented Oct 4, 2024 • edited Loading

stbailey001 commented Oct 4, 2024 •

edited

Loading