-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connection closed by SUMO #295
Comments
Hello, could you give a few more details on what you were running to cause this? Does this occur for you with the |
I run the following code.
The console output is
|
I'm looking forward to you answer. Thanks. |
Hello, sorry for the late reply. I have done some testing for this error and the solution is unclear but the issue is reproducible. Traceback (most recent call last):
File "examples/rllib_problem.py", line 47, in <module>
trajectory = ray.get([env.sample.remote() for env in environment])
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 1506, in get
values = worker.get_objects(object_ids, timeout=timeout)
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 312, in get_objects
return self.deserialize_objects(data_metadata_pairs, object_ids)
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 280, in deserialize_objects
return context.deserialize_objects(data_metadata_pairs, object_ids)
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 323, in deserialize_objects
self._deserialize_object(data, metadata, object_id))
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 284, in _deserialize_object
obj = self._deserialize_pickle5_data(data)
File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 262, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named 'traci'
It looks like one of the ray workers was unable to import traci and then shortly later the traci connection also fails. This is not an issue we have seen before so it might take some time to resolve. |
May have found the potential cause: #313 |
Hi Gamenot. I also face the same problem. Is there any progress on this problem? |
Hello, @JianmingTONG, there is some good progress on #331. This is a fairly critical and should solve both the error messages and the SUMO connection issue. |
Thanks for the reply @Gamenot . I have tried the version that seems to solve the sumo issue. i.e. 2a45972 (commit ID). However, there is still the "connection closed by sumo" error, when I try the following commands. Note: I change the episode from 10 to 1000000 to test the scenario without launching the training process.
PS: both terminal 2 and terminal 3 died at 8185 iteration. |
Hi @JianmingTONG, we aware of two separate shortcomings in the impementation of (i) ray and (ii) remote agents. We are curently actively looking into them both. Unfortunately, #331 is not ready for use yet. |
Hi @Adaickalavan @Gamenot, I see that #366 has been closed. Has the issue been solved? Thanks, wish you a happy new year. |
@JianmingTONG Happy new year, thank you, it is looking like the problem is addressed however we are testing to make sure it is, in fact, solved. |
As for the use with A modified example is as follows:
What this means specifically is that the underlying smarts instance needs to be disposed: SMARTS/smarts/env/hiway_env.py Lines 202 to 204 in d006ace
|
Hi @JianmingTONG , happy new year. It appears that we have fixed this issue alongside other distributed computing issues (#331). I have verified that executing the commands below, the code runs successfully to completion. Run in terminal 1:
See the visualization in a browser at
Run in terminal 3:
Going forward, please
|
To summarize, the problem is broken down to two parts:
|
Hi, I follow the instructions here to launch the example evaluation. However, it complains the following issues.
And I have tested some other scenarios as following: Might I request your help to fix it? |
I have found another potential source of the crash when going through the crash report from running the example I provided. I am hoping we can do something about this without going into SUMO code.
The cause looks like it might be from changes in the 1.7.0 release of SUMO. I am unsure why this |
SUMO connection closed
|
Hi @JianmingTONG, Given the occurrence of I think the error does not happen when SMARTS is run inside a docker container. $ docker run --rm -it --network=host huaweinoah/smarts:v0.4.12 Do not map the source code using |
I am also currently facing the same issue. After 10 million training steps the training process is getting killed. I am using SMARTS 0.4.16 version @Gamenot @Adaickalavan @JianmingTONG @BBDrive Is the issue fixed? If so can you please mention the pull request using which this issue is fixed? Also moving to 0.4.18 version or any other branch solve this issue? If so you can mention the branch that I can use |
Hi @dineshresearch, Unfortunately, the For the time being, if you do not need background traffic vehicles, you may consider setting Lines 68 to 78 in e3681e7
|
When I run multiple instances with
ray
, it gives an error.But it can still work for a few rounds. After running for a while, it crashed.
The text was updated successfully, but these errors were encountered: