Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ReinforcementLearning.jl integration #9

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rejuvyesh
Copy link
Contributor

@rejuvyesh rejuvyesh commented Mar 9, 2022

I realized that CommonRLInterface.jl never settled on what to do with continuous action spaces, so directly integrating with RLBase from ReinforcementLearning.jl.

Will add tests and examples with PPO and DDPG.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.31%. Comparing base (5563639) to head (eb379f6).
Report is 411 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main       #9      +/-   ##
==========================================
- Coverage   92.48%   92.31%   -0.17%     
==========================================
  Files          81       81              
  Lines        4005     3761     -244     
==========================================
- Hits         3704     3472     -232     
+ Misses        301      289      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 31 to 34
actor = Chain(
Dense(ns, 256, relu; init = glorot_uniform(rng)),
Dense(256, na; init = glorot_uniform(rng)),
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are using the discrete version of PPO here. But the cart pole env here seems to be a continuous version. (The actions space is [-1.0, 1.0]). So you may take reference from https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/935f68b6cb378f9929a8d9914eb388e86213c86d/src/ReinforcementLearningExperiments/deps/experiments/experiments/Policy%20Gradient/JuliaRL_PPO_Pendulum.jl#L43-L50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks for checking in. Although currently I also need to define the reward/cost function for cartpole on Dojo side.

@janbruedigam
Copy link
Member

We should probably rethink the interface to ReinforcementLearning.jl once their updates are done (JuliaReinforcementLearning/ReinforcementLearning.jl#614)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants