-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help Request] traci #2139
Comments
Hello @knightcalvert, we are currently using this method to acquire port numbers for SUMO. I did a bit of digging into SUMO port creation and shutdown. I think perhaps this line killing SUMO is preventing the destructor being called and cleanup of SUMO's used port. SMARTS/smarts/core/utils/sumo.py Line 226 in e2144e1
I'll try tomorrow to see if I can apply a cleaner shutdown without blocking on sumo closing. I will also squash the messages. They are related one of sumo's methods that uses print for warnings...: |
@knightcalvert SUMO logging should be now squashed: 16665d5 |
An update. |
#2140 is intended to fix the issue. From current stress testing, @knightcalvert One thing to note, even after this, if you are using SMARTS directly make sure that each SMARTS instance calls |
after updating the code, it helpful, the useless output is gone.
and my code have using close()
i‘m not familiar with the traci or ports, and just curious that if one port could not connect in many tries, it seems useless to continue connecting this port, can i tried to connect another one? |
I think I understand the issue better now. It does not seem to have to do with number of ports but with the SUMO server and SMARTS somehow not paring. My only thought is that somehow this is happening: Once a connection is established, We get ports by getting a random free port recommended by the OS (out of 64512 standard ports), so the chance of ports colliding is very low but possible.
It was assumed that it would connect or retry with a different port. I will put in a patch that will reattempt with a different port while I think of a way to gracefully handle the root cause.
Does this still happen? |
I have attempted a patch 0930736 for that now in #2140. It retries with a different port and saves the TraCI server on a stolen connection to avoid interrupting a different instance. I will need to do a follow-up fix. |
my last run is stucked in 401027 episodes, as far as i know, the problem 2 is not happened again. I roughly understand what you say, it seems like if i reduce the number of parallel runs, the possibility of port occupation will reduce too? |
Honestly, it would reduce the chances but not completely prevent it. I am pursuing a different solution that uses a centralised server to prevent port collisions (at least between sumo instances). As of 94da02f it looks like this: ## console 1 (or in background OR on remote machine)
# Run the centralized sumo port management server.
# Use `export SMARTS_SUMO_CENTRAL_PORT=62232` or `--port 62232`
$ python -m smarts.core.utils.centralized_traci_server ## console 2
## Set environment variable to switch to the server.
$ export SMARTS_SUMO_TRACI_SERVE_MODE=central
## Unnecessary but optional
# export SMARTS_SUMO_CENTRAL_HOST=localhost
# export SMARTS_SUMO_CENTRAL_PORT=62232
## do run
$ python experiment.py It works as is right now but when I get it working better I will likely integrate the server generation into the main process and set it as the default behaviour. I think I will also eventually use the server as a pool of |
The newest change 25789e9 resulted in no disconnects and no port collisions against 60k instances and 32 parallel experiments. |
Hi @Gamenot I am still facing the issue for TraCI server. I ran the cmd Then I ran in 2nd console I get same error like this -
|
High Level Description
I have noticed i have the same problem with #2127, so i update the latest smarts version. but the problems still exist.
problem 1 :
in the beginning, the traci will tried to connect to different ports. however, after running 10 hours, the traci only tried to connect to the same port with constant failure, so my code stucked, i have to rerun my code.
problem 2 :
this problem is like the #2127, because of
so my smarts can't reset successfully, i update the code, but this problem still exist occasionally.
and can i ture off this traci warning? 90% of my console output is traci warning, I can't find the info i really need. thank you very much
Version
the latest v
Operating System
ubuntu
Problems
No response
The text was updated successfully, but these errors were encountered: