Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while trying to implement basic Optuna tutorial for SAC #89

Open
arun-dezerv opened this issue Jun 1, 2024 · 0 comments
Open

Comments

@arun-dezerv
Copy link

https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/4-Optimization/FinRL_HyperparameterTuning_using_Optuna_basic.ipynb

Hello - While the above tutorial works well for DDPG, I am unable to recreate the same for SAC. While the tutorial works well for 50000 timesteps for DDPG, the same starts failing when the number of timesteps is more than 100 for SAC. When recreating for SAC with anything more than 100 timesteps, I get the below error:

[I 2024-06-01 11:43:00,899] A new study created in memory with name: sac_study
:4: FutureWarning: suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.
learning_rate = trial.suggest_loguniform("learning_rate", 1e-1, 1)
{'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256}
Using cpu device
[W 2024-06-01 11:43:06,313] Trial 0 failed with parameters: {'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256} because of the following error: ValueError('Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values:\ntensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan]])').
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
value_or_values = func(trial)
File "", line 63, in objective
trained_sac = agent.train_model(model=model_sac,
File "/usr/local/lib/python3.10/dist-packages/finrl/agents/stablebaselines3/models.py", line 117, in train_model
model = model.learn(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/sac.py", line 307, in learn
return super().learn(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 328, in learn
rollout = self.collect_rollouts(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 557, in collect_rollouts
actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 390, in _sample_action
unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/base_class.py", line 556, in predict
return self.policy.predict(observation, state, episode_start, deterministic)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/policies.py", line 368, in predict
actions = self._predict(obs_tensor, deterministic=deterministic)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 353, in _predict
return self.actor(observation, deterministic)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 170, in forward
return self.action_dist.actions_from_params(mean_actions, log_std, deterministic=deterministic, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 190, in actions_from_params
self.proba_distribution(mean_actions, log_std)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 224, in proba_distribution
super().proba_distribution(mean_actions, log_std)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 164, in proba_distribution
self.distribution = Normal(mean_actions, action_std)
File "/usr/local/lib/python3.10/dist-packages/torch/distributions/normal.py", line 56, in init
super().init(batch_shape, validate_args=validate_args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributions/distribution.py", line 68, in init
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan]])
[W 2024-06-01 11:43:06,315] Trial 0 failed with value None.

ValueError Traceback (most recent call last)
in <cell line: 89>()
87 logging_callback = LoggingCallback(threshold=1e-5,patience=30,trial_number=5)
88 #You can increase the n_trials for a better search space scanning
---> 89 study.optimize(objective, n_trials=10,catch=(ValueError,),callbacks=[logging_callback])
90
91 joblib.dump(study, "final_sac_study__.pkl")

6 frames
/usr/local/lib/python3.10/dist-packages/optuna/storages/_in_memory.py in get_best_trial(self, study_id)
232
233 if best_trial_id is None:
--> 234 raise ValueError("No trials are completed yet.")
235 elif len(self._studies[study_id].directions) > 1:
236 raise RuntimeError(

ValueError: No trials are completed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant