Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is UCX's policy of choosing transport? #5320

Closed
rzambre opened this issue Jun 23, 2020 · 2 comments
Closed

What is UCX's policy of choosing transport? #5320

rzambre opened this issue Jun 23, 2020 · 2 comments
Labels

Comments

@rzambre
Copy link
Contributor

rzambre commented Jun 23, 2020

Describe the bug

In the past (with UCX 1.5.0), I used to set UCX_NET_DEVICES=mlx5_0:1 and UCX_TLS=rc_mlx5,rc and hope that rc_mlx5 would be used during the fast-path operations. If I set UCX_TLS=rc_mlx5 only, I would get an error during ucp_init.

With the latest UCX master, I see with ucx_info -d that there are rc_verbs and rc_mlx5 transports. But when I set UCX_TLS=rc_mlx5,rc_verbs, I get an error during initialization. But after playing around, I discovered that setting UCX_TLS=rc_mlx5,rc (as I had done earlier) works even though rc is not listed in ucx_info -d.

(1) What is the difference between setting UCX_TLS=rc_mlx5,rc_verbs and UCX_TLS=rc_mlx5,rc?

What works with using the transports listed in ucx_info -d is UCX_TLS=rc_mlx5,ud_[mlx5|verbs].

(2) More generally, is there an overview of how UCX chooses which transport to use for its critical-path operations such as ucp_tag_send_nb?

Steps to Reproduce

  • Command line: mpiexec -n 2 -ppn 1 -hosts <node1>,<node2> -env UCX_NET_DEVICES mlx5_0:1 -env UCX_TLS=rc_mlx5,rc_verbs ./osu_mbw_mr
  • UCX version used: master @ eaad8e2 + UCX configure flags: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt
  • MPICH/CH4/UCX @ d1e673a

Setup and versions

@rzambre rzambre added the Bug label Jun 23, 2020
@yosefe
Copy link
Contributor

yosefe commented Jun 23, 2020

@rzambre pls see https://openucx.readthedocs.io/en/master/faq.html#selecting-networks-and-transports
rc_verbs and rc_mlx5 should not be used directly, but rather the ones listed in https://openucx.readthedocs.io/en/master/faq.html#list-of-main-transports-and-aliases

@rzambre
Copy link
Contributor Author

rzambre commented Jun 23, 2020

Thanks! Wasn't aware of the new documentation. That is helpful.

@rzambre rzambre closed this as completed Jun 23, 2020
@yosefe yosefe pinned this issue Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants