-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] ReinforcementLearning.jl integration #9
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 92.48% 92.31% -0.17%
==========================================
Files 81 81
Lines 4005 3761 -244
==========================================
- Hits 3704 3472 -232
+ Misses 301 289 -12 ☔ View full report in Codecov by Sentry. |
22e4549
to
b606aa1
Compare
examples/deeprl/cartpole_ppo.jl
Outdated
actor = Chain( | ||
Dense(ns, 256, relu; init = glorot_uniform(rng)), | ||
Dense(256, na; init = glorot_uniform(rng)), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you are using the discrete version of PPO here. But the cart pole env here seems to be a continuous version. (The actions space is [-1.0, 1.0]
). So you may take reference from https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/935f68b6cb378f9929a8d9914eb388e86213c86d/src/ReinforcementLearningExperiments/deps/experiments/experiments/Policy%20Gradient/JuliaRL_PPO_Pendulum.jl#L43-L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Thanks for checking in. Although currently I also need to define the reward/cost function for cartpole on Dojo side.
We should probably rethink the interface to ReinforcementLearning.jl once their updates are done (JuliaReinforcementLearning/ReinforcementLearning.jl#614) |
I realized that
CommonRLInterface.jl
never settled on what to do with continuous action spaces, so directly integrating with RLBase from ReinforcementLearning.jl.Will add tests and examples with PPO and DDPG.