Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; Failed to connect to remote host: Timeout occurred: FD Shutdown #950

Open
C-Dhayananthan opened this issue Nov 26, 2024 · 0 comments

Comments

@C-Dhayananthan
Copy link

C-Dhayananthan commented Nov 26, 2024

I was trying to run model analyzer with triton launch - local (default)
the below commad is run the container (model-analyzer - imagename)

docker run -it --rm --gpus all -v $(pwd):/workspace --net=host model-analyzer
sweep.yaml given below

model_repository: /workspace/model_repositories
triton_launch_mode: local
profile_models:
        - minilm
perf_analyzer_flags:
        input-data: "random"
triton_server_flags:
  log_verbose: True
  exit_timeout_secs: 120

and i using this commad to run model-analyzer in container
model-analyzer profile -f sweep.yaml

ISSUE


 [Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
Traceback (most recent call last):
 File "/workspace/model_analyzer/entrypoint.py", line 198, in create_output_model_repository
   os.mkdir(config.output_model_repository_path)
FileExistsError: [Errno 17] File exists: '/workspace/output_model_repository'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "/usr/local/bin/model-analyzer", line 8, in <module>
   sys.exit(main())
 File "/workspace/model_analyzer/entrypoint.py", line 266, in main
   create_output_model_repository(config)
 File "/workspace/model_analyzer/entrypoint.py", line 201, in create_output_model_repository
   raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Path "/workspace/output_model_repository" already exists. Please set or modify "--output-model-repository-path" flag or remove this directory. You can also allow overriding of the output directory using the "--override-output-model-repository" flag.
root@test-MS-7D70:/workspace# rm -rf /workspace/output_model_repository
root@test-MS-7D70:/workspace# model-analyzer profile -m examples/quick-start --profile-models add_sub
[Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
[Model Analyzer] Starting a local Triton Server
[Model Analyzer] Loaded checkpoint from file /workspace/checkpoints/6.ckpt
[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition
[Model Analyzer] 
[Model Analyzer] Starting automatic brute search
[Model Analyzer] 
[Model Analyzer] Creating model config: add_sub_config_default
[Model Analyzer] 
[Model Analyzer] Saved checkpoint to /workspace/checkpoints/7.ckpt
Traceback (most recent call last):
 File "/workspace/model_analyzer/triton/client/client.py", line 60, in wait_for_server_ready
   if self._client.is_server_ready():
 File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 344, in is_server_ready
   raise_error_grpc(rpc_error)
 File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
   raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "/usr/local/bin/model-analyzer", line 8, in <module>
   sys.exit(main())
 File "/workspace/model_analyzer/entrypoint.py", line 278, in main
   analyzer.profile(
 File "/workspace/model_analyzer/analyzer.py", line 131, in profile
   self._profile_models()
 File "/workspace/model_analyzer/analyzer.py", line 251, in _profile_models
   self._model_manager.run_models(models=[model])
 File "/workspace/model_analyzer/model_manager.py", line 154, in run_models
   measurement = self._metrics_manager.execute_run_config(run_config)
 File "/workspace/model_analyzer/record/metrics_manager.py", line 238, in execute_run_config
   if not self._load_model_variants(run_config):
 File "/workspace/model_analyzer/record/metrics_manager.py", line 452, in _load_model_variants
   if not self._load_model_variant(variant_config=mrc.model_config_variant()):
 File "/workspace/model_analyzer/record/metrics_manager.py", line 467, in _load_model_variant
   retval = self._do_load_model_variant(variant_config)
 File "/workspace/model_analyzer/record/metrics_manager.py", line 474, in _do_load_model_variant
   self._client.wait_for_server_ready(
 File "/workspace/model_analyzer/triton/client/client.py", line 72, in wait_for_server_ready
   raise TritonModelAnalyzerException(e)
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown
 

need help to fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant