Skip to content

Model-based Offline Reinforcement Learning with Count-based Conservatism

License

Notifications You must be signed in to change notification settings

oh-lab/Count-MORL

Repository files navigation

Count-MORL

The official repository for <Model-based offline Reinforcement Learning with Count-based Conservatism> (Kim & Oh, ICML 2023)

Count-MORL is a novel approach to model-based offline RL, which incorporates count-based conservatism by adding a reward penalty proportional to the inverse-square-root of the frequency of state-action pairs. Our work is the first to demonstrate that count-based conservatism, combined with a non-trivial adaptation to offline deep RL, effectively bridges the gap between theoretical claims and practical applications.

Dependencies

Usage

Train

# for halfcheetah-random-v2 with LC count
python train.py --task "halfcheetah-random-v2" --rollout-length 5 --reward-penalty-coef 1.0 --bit 50 --cnt_type "LC" --cnt_coef 0.5 --use_count "True"
# for walker2d-medium-v2 with AVG count
python train.py --task "walker2d-medium-v2" --rollout-length 20 --reward-penalty-coef 3.0 --bit 32 --cnt_type "AVG" --cnt_coef 0.5 --use_count "True"
# for hopper-medium-replay-v2 with UC count
python train.py --task "hopper-medium-replay-v2" --rollout-length 5 --reward-penalty-coef 1.0 --bit 50 --cnt_type "UC" --cnt_coef 0.5 --use_count "True"

If the argument of use_count is false, you can implement the MOPO algorithm.

Plot

python plotter.py --root-dir "log" --task "halfcheetah-random-v2"

Results

All experiments were run for 5 random seeds each and learning curves are smoothed by averaging over a window of 10 epochs.

MuJoCo-v2

Reference

About

Model-based Offline Reinforcement Learning with Count-based Conservatism

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages