Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla · 2023-12-04T17:57:30Z

Attempt to fix #1770 in a fully backward compatible manner.

Description

In PPO, the optimizer in the policy is created before the computation-device for the class is correctly set. This is a problem when the optimizer checks the target computation-device on initialization. This is a backward compatible fix.

Motivation and Context

Fixes #1770 . One can now use the fused option in the Adam optimizer on CUDA devices, which, according to the documentation, is faster.

I have raised an issue to propose this change

Types of changes

Bug fix

Checklist

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

cmangla · 2023-12-05T19:02:11Z

@araffin This is ready for the CI tests now and potentially also to merge.

cmangla · 2023-12-06T12:07:31Z

stable_baselines3/common/policies.py

@@ -885,6 +895,7 @@ def __init__(
        normalize_images: bool = True,
        optimizer_class: Type[th.optim.Optimizer] = th.optim.Adam,
        optimizer_kwargs: Optional[Dict[str, Any]] = None,
+        _init_optimizer=True,  # Currently unused, see PR #1771


@araffin I'm currently testing enabling this one too. I will update this PR accordingly, hence switching it back to draft.

cmangla · 2023-12-07T15:04:36Z

@araffin Looks good now

araffin · 2024-06-10T13:20:26Z

#1770 (comment)

cmangla marked this pull request as draft December 4, 2023 17:57

cmangla mentioned this pull request Dec 4, 2023

[Bug:] Cannot use the fused flag in default optimizer of PPO #1770

Closed

cmangla marked this pull request as ready for review December 5, 2023 10:27

cmangla changed the title ~~Create optimizer in PPO only after the device is set~~ Create optimizer in OnPolicyAlgorithm only after the device is set Dec 5, 2023

cmangla marked this pull request as draft December 5, 2023 12:16

cmangla marked this pull request as ready for review December 5, 2023 14:01

cmangla commented Dec 6, 2023

View reviewed changes

cmangla marked this pull request as draft December 6, 2023 12:07

Create optimizer in PPO after device is set

daf9f7a

cmangla force-pushed the pr-delay-ppo-optimizer branch from 4d07ede to daf9f7a Compare December 7, 2023 09:17

cmangla marked this pull request as ready for review December 7, 2023 15:03

araffin closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla commented Dec 4, 2023 •

edited

Loading

cmangla commented Dec 5, 2023

cmangla Dec 6, 2023

cmangla commented Dec 7, 2023

araffin commented Jun 10, 2024

Create optimizer in OnPolicyAlgorithm only after the device is set #1771

Create optimizer in OnPolicyAlgorithm only after the device is set #1771

Conversation

cmangla commented Dec 4, 2023 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

cmangla commented Dec 5, 2023

cmangla Dec 6, 2023

Choose a reason for hiding this comment

cmangla commented Dec 7, 2023

araffin commented Jun 10, 2024

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla commented Dec 4, 2023 •

edited

Loading