-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel collection and evaluation #143
Comments
Hello! Thanks for opening this and sorry for the delay in answering. In vectorized environments, both collection and evaluation are done using a batch of vectorized environments. In other environments, right now, both collection and evaluation are sequentially in the number of environments. Collection: BenchMARL/benchmarl/experiment/experiment.py Line 439 in 9813807
Evaluation: BenchMARL/benchmarl/experiment/experiment.py Line 833 in 9813807
Allowing to change both of these to Parallel has long been on the TODO list: #94 This could be as simple as changing This is top of the todo list so I think I will get to it when i have time. RE your specific case, in meltingpot changing the |
Hello! Thanks for the response. It is good to know that the issue is at the top of the todo list. You are right that the collection and evaluation is done sequentially. Just to follow your suggestion, I changed SerialEnv to ParallelEnv yet it led to many errors so I stopped. Also, I definitely see execution time improvements when I set n_envs_per_worker from 2 to 20. But I guess it has something to do with the reset method of meltingpot envs. Here is an example run of IQN on Harvest env with 10 agents with off_policy_n_envs_per_worker: 20,
There is also an increase in time execution when episodes end. I guess, at the end, it cancels out the improvement on regular iterations. |
Ok that is what I was afraid of. In theory they should be interchangable but in practice they are my first cause of migranes (hence why we only have serial for now). but when I gather some courage I'll look into it. Rgarding the other part of the message: anything out of what you expected/something I can help with? |
Nope. Thanks for the quick responses. |
I ll just keep this open until the feature lands |
I revisited the issue and you were right that switching from SerialEnv to ParallelEnv works! Apparently, the problem was about how I pass some env config params to env creator function. I guess ParallelEnv does not copy task config as SerialEnv do. I changed the way I pass the args and removed hydra option and now it works. |
Nice! Would you be able to share your solution in a PR? Also maybe if you can open an issue in torchrl outlining where the serial and parallel differ that you did not expect |
Well, in terms of collection time, ParallelEnv improves a lot. However, after checking the results, I can see that there is a big change in terms of learning performance. I ran some more tests with the config below (on IQL), only changing SerialEnv - ParallelEnv, and somehow the learning is very poor when I use ParallelEnv.
I thought the only difference is that SerialEnv is just stepping 20 envs in sequence whereas ParallelEnv steps them in seperate processes. Note that an episode ends only if 1000 steps are taken. I am not sure if this originates from MeltingPot and it is due to async collection from envs. |
Oh no, that does not sound good. I feared something like this. I'll need to take a look We need to identify where this deviation first occurs. Maybe the first apporoach would be to test with a non-learned deterministic policy and see if the result is different betwen the 2 envs |
I think it is not about learning but logging. When I run with 2 envs, I realize that episode_reward metric for an agent gets either integer values or values with 0.5 with SerialEnv whereas it has values like 23.2175006866455 with ParallelEnv. I checked the batches in either case, and saw a strange difference. For episodes of 1000 steps,
I pinpointed the problem here: If I remove TransformedEnv, the done keys return as expected. Any ideas on the reason? |
Do you know what specific transform causes this to happen? ccing @vmoens |
I tried passing an empty list of Transforms but it led to other errors. MeltingpotTask has additional transforms but I am not sure they are the reason. Maybe it is about reset keys? Or done_specs? |
Mhh i would need a bit more systematic report. If you could pinpoint with more granularity what component causes the deviation I can take a look. A reprod script would also help although I know it is time consuming. Maybe you can tell me what are the minimal changes you did to benchmarl main to reach your state |
I tried a fresh clone in a fresh conda env with the following script;
where mp_cnn contains `name: cnn mlp_num_cells: [128] cnn_num_cells: [32, 128] However now this gives the following error:
I'll continue debugging on my own branch to make ParallelEnv work for now. |
Here is an odd observation: when I change the sampling_device from 'cpu' to 'cuda', the done keys are correct as in the SerialEnv setting. In fact, when I set
It prints
I guess casting the rollout to the sampling device applies a transformation that messes up the done keys. |
Ok. I got the problem solved. I was using a custom wrapper for MeltingPot envs and I realize that I did not pass device parameter to the constructor. As I did, this problem vanishes. I guess the mismatch was the reason. At the end, just switching from SerialEnv to ParallelEnv works. Sorry for the trouble. Thanks for the help! |
Really nice to hear! Feel free to make a PR to enable users to choose parallel or serial if this is easy |
Sorry that I couldn't respond before. Thanks for the PR! I wanted parallel collection since I had a model with an LSTM layer. Now with ParallelEnv, it gives an error;
I checked the arguments of ParallelEnv but couldn't find a solution yet. |
Got it! I'll try to investigate |
Here is further info; BatchedEnv from TorchRL collects the keys from the next field of the tensordict as the keys to be passed between the processes. If I make the data variable empty in the below line; the error disappears and learning is fine with a model of LSTM. Keys like "_hidden_lstm_h_1" are expected in the tensordict from envs since LSTM model puts them there but somehow the values are missing when the actual transfer is to occur. If I comment the below lines; BenchMARL/benchmarl/models/lstm.py Line 501 in 271edee
the error again disappears and the learning is fine. Can you check if the implementation of the LSTM layer really requires these? |
What exactly did you comment out? Just 501? The LSTM needs to write those values during execution as they have to be read in the next step, during training they are not needed as we process a time batch and not a single step |
I understand that LSTM needs the values to work properly. So, these values are needed to be passed between processes (copies of env) but they don't exist in the tensordict that is passed to _step() method in BatchedEnv. The update_() here; This passthrough section does not exist in SerialEnv. I guess that is why a model with LSTM does not create an issue with SerialEnv. Thanks for the help :) |
@gliese876b try now, should work |
@matteobettini, thanks! I run a test with the same model configs on the branch after your commits. It gets the error below;
This is related to meltingpot using lab2d as backend.
and this solves the issue although it is restricted to Unix-like systems. I am running on Ubuntu at the moment. |
Mmmh i don't get this error, can you confirm you are using just the code from my PR? |
I created a fresh conda env and pulled branch named parallel. I run the following command;
and the base_experiment contains
|
This is strange... it only happens if there is an rnn in the sequence |
I think I got it. Some weird lambda thing. Try now, fingers crossed 🤞 |
Yes, no errors, working fine with a model of LSTM on ParallelEnv. Great work! Thank you so much. Now that the collection is parallel, this will save a lot of time. |
Can _evaluation_loop use SyncDataCollector for non vectorized envs so that the evaluation is also parallel?
While running on Melting Pot envs, increasing n_envs_per_worker definitely improves execution time but the evaluation steps take almost 3 times longer (I have evaluation_episodes: 10) than a regular iteration since the evaluation is sequential.
Making test_env SerialEnv could solve the issue.
The text was updated successfully, but these errors were encountered: