TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

C-Dhayananthan · 2024-11-26T19:22:49Z

I was trying to run model analyzer with triton launch - local (default)
the below commad is run the container (model-analyzer - imagename)

docker run -it --rm --gpus all -v $(pwd):/workspace --net=host model-analyzer
sweep.yaml given below

model_repository: /workspace/model_repositories
triton_launch_mode: local
profile_models:
        - minilm
perf_analyzer_flags:
        input-data: "random"
triton_server_flags:
  log_verbose: True
  exit_timeout_secs: 120

and i using this commad to run model-analyzer in container
model-analyzer profile -f sweep.yaml

ISSUE


 [Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
Traceback (most recent call last):
 File "/workspace/model_analyzer/entrypoint.py", line 198, in create_output_model_repository
   os.mkdir(config.output_model_repository_path)
FileExistsError: [Errno 17] File exists: '/workspace/output_model_repository'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "/usr/local/bin/model-analyzer", line 8, in <module>
   sys.exit(main())
 File "/workspace/model_analyzer/entrypoint.py", line 266, in main
   create_output_model_repository(config)
 File "/workspace/model_analyzer/entrypoint.py", line 201, in create_output_model_repository
   raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Path "/workspace/output_model_repository" already exists. Please set or modify "--output-model-repository-path" flag or remove this directory. You can also allow overriding of the output directory using the "--override-output-model-repository" flag.
root@test-MS-7D70:/workspace# rm -rf /workspace/output_model_repository
root@test-MS-7D70:/workspace# model-analyzer profile -m examples/quick-start --profile-models add_sub
[Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
[Model Analyzer] Starting a local Triton Server
[Model Analyzer] Loaded checkpoint from file /workspace/checkpoints/6.ckpt
[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition
[Model Analyzer] 
[Model Analyzer] Starting automatic brute search
[Model Analyzer] 
[Model Analyzer] Creating model config: add_sub_config_default
[Model Analyzer] 
[Model Analyzer] Saved checkpoint to /workspace/checkpoints/7.ckpt
Traceback (most recent call last):
 File "/workspace/model_analyzer/triton/client/client.py", line 60, in wait_for_server_ready
   if self._client.is_server_ready():
 File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 344, in is_server_ready
   raise_error_grpc(rpc_error)
 File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
   raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "/usr/local/bin/model-analyzer", line 8, in <module>
   sys.exit(main())
 File "/workspace/model_analyzer/entrypoint.py", line 278, in main
   analyzer.profile(
 File "/workspace/model_analyzer/analyzer.py", line 131, in profile
   self._profile_models()
 File "/workspace/model_analyzer/analyzer.py", line 251, in _profile_models
   self._model_manager.run_models(models=[model])
 File "/workspace/model_analyzer/model_manager.py", line 154, in run_models
   measurement = self._metrics_manager.execute_run_config(run_config)
 File "/workspace/model_analyzer/record/metrics_manager.py", line 238, in execute_run_config
   if not self._load_model_variants(run_config):
 File "/workspace/model_analyzer/record/metrics_manager.py", line 452, in _load_model_variants
   if not self._load_model_variant(variant_config=mrc.model_config_variant()):
 File "/workspace/model_analyzer/record/metrics_manager.py", line 467, in _load_model_variant
   retval = self._do_load_model_variant(variant_config)
 File "/workspace/model_analyzer/record/metrics_manager.py", line 474, in _do_load_model_variant
   self._client.wait_for_server_ready(
 File "/workspace/model_analyzer/triton/client/client.py", line 72, in wait_for_server_ready
   raise TritonModelAnalyzerException(e)
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown

need help to fix this issue

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

C-Dhayananthan commented Nov 26, 2024 •

edited

Loading

TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

Comments

C-Dhayananthan commented Nov 26, 2024 • edited Loading

C-Dhayananthan commented Nov 26, 2024 •

edited

Loading