-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in collectives due to UCX_PROTO_ENABLE #9914
Comments
Hi, could you please completely fill the bug template that is described during issue creation? |
@ivankochin Of course! here comes. Let me know if you need more info. Steps to Reproduce
Setup and versions
|
@ivankochin FYI, I have tried the same test on our older system (ConnectX-4), Rocky Linux release 9.2, MLNX_OFED_LINUX-5.8-3.0.7.0. It seems the performance is similar with both It seems in our case only the |
I attach the output of |
@angainor could you please also specify |
@ivankochin This turned out to be a rather large file (70MB), too large to attach here. I've made it available for download here: https://filesender.uio.no/?s=download&token=226c039d-abf0-4e8d-93ab-df04a8fee91e Please let me know if you have problems accessing it. |
@angainor So it would be great if you are able to do some extra experiments. I want to ask you to download the latest UCX master, build it in release mode (using |
The regression is still there, looks similar +- some glitches. I will recompile with debug and collect the logs.
|
Here are logs from one process. I hope this is enough, otherwise the entire file is rather large. Note that this time I only ran the benchmark with message sizes up to 8 bytes due to the amount of logs. Please let me know if this is ok, or if you need more. |
Thanks for all your efforts. Do I understand correctly that all the logs mentioned here were collected with
Are you sure? I see messages up to 2048 bytes in that log. Root cause isn't still defined since according to logs both protov1 and protov2 uses same protocols. Could you please also re-measure
|
Yes, but as I mentioned before, this happens also if I specify DC, or if I don't set
log file:
log file: Note that in the logged runs
At least this is what I asked OSU benchmark to test. But I guess it might send other messages during the run time, I don't know.
These are for
For reference, here are results with
|
@ivankochin Is there any further info about this? |
@angainor thanks for you patience! I have successfully reproduced that problem on my setup and found a reason of degradation. But fixing of that requires changes HCOLL codebase (which is unlikely to happen), so workaround can be using UCC collectives instead of HCOLL which also need to be patched to fix that degradation. The problem is that current solution shows degradation on bigger sizes which I am checking right now. |
UCC in their current form are much slower than HCOLL in many cases, so it is not an option. I guess I will force |
UCX_PROTO_ENABLE=n is not supported anymore, so it surely can impact performance in other scenarios (we don't have data whether it will or won't in your case). |
This looks like an important performance issue. If a fix is unlikely anytime soon, and I should not use In either case, please let me know when you have any solution / update for this. |
@angainor I think setting Regarding the update, as I said previously it is not possible to patch HCOLL, and we are considering patching UCC right now, but it likely will take some time. |
I noticed a performance regression in OSU benchmark (OpenMPI with UCX and HCOLL) when using HPCX 2.17.1 compared to 2.14. It is due to the fact that now the
UCX_PROTO_ENABLE=y
by default. Setting it ton
improves performance. Here are some results ofosu_alltoall
, but I have also tested and see problems forallreduce
,allgather
,bcast
.Results with HPCX 2.14 (
UCX_PROTO_ENABLE=n
by default)Results with HPCX 2.17.1 (
UCX_PROTO_ENABLE=y
by default)The text was updated successfully, but these errors were encountered: