Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::runtime_error' what(): error in network setup: daa4eabb64cf : Receiving error - 1 : Connection reset by peer #1435

Open
xinushio opened this issue Jun 27, 2024 · 8 comments

Comments

@xinushio
Copy link

xinushio commented Jun 27, 2024

I started two Docker containers, container1 and container2, binding ports 8090 and 8091 respectively. When I run the following command in the container2: ./semi2k-party.x -N 2 -IF Player-Data/Input -p 1 -h 192.168.10.11 -pn 8090 dual_sum
an exception occurs with "Connection reset by peer," causing the program to terminate.

However, when I run it on the host machine, this exception does not occur. It seems that during the connection retry process of MP-SPDZ, there is a lack of handling for the "Connection reset by peer" exception.

@mkskeller
Copy link
Member

What is the output on party 0? The output indicates that it might have failed first.

@xinushio
Copy link
Author

I have not yet started party 0, but based on the running conditions on the host machine, under normal circumstances, party 1 should retry connecting to party 0 multiple times within a minute.

@xinushio
Copy link
Author

However, when running in the container, party 1 fails immediately.

@mkskeller
Copy link
Member

Retrying connections is indeed implemented. However, the error message indicates that the initial connection is accepted but then dropped, so I'm wondering what happens if party 0 is started first as the first connection goes from party 1 to party 0.

@xinushio
Copy link
Author

If party 0 is started first, the computation proceeds normally.

@mkskeller
Copy link
Member

I see but I'm not sure what to make of this. My understanding is that party 0 not being present should lead to the connection being rejected rather being accepted just to be dropped. Do you think this is normal behavior?

@xinushio
Copy link
Author

Based on my testing, under normal circumstances, when party 1 tries to connect to party 0 and party 0 is not started, party 1 receives a "connection refused" exception. MP-SPDZ correctly catches this exception and initiates the next retry. However, when the Docker container is started, a docker-proxy process listens on the port. When MP-SPDZ tries to access this port, it receives a "connection reset by peer" exception. It is possible that MP-SPDZ does not correctly handle this exception, causing the program to fail directly.

@xinushio
Copy link
Author

I hope MP-SPDZ can catch exceptions and retry in both scenarios until after 1 minute. Can MP-SPDZ help resolve this issue? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants