Skip to content

A Dual-RL method DVL: Dual-V Learning for offline and online reinforcement learning

Notifications You must be signed in to change notification settings

hari-sikchi/DVL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Dual V-Learning (DVL)

Official code base for Dual RL: Unification and New Methods for Reinforcement and Imitation Learning by Harshit Sikchi, Qinqing Zheng, Amy Zhang, and Scott Niekum.

This repository contains code for Dual V-Learning (DVL) framework for Reinforcement Learning proposed in our paper.

Please refer to instructions inside the offline folder to get started with installation and running the code.

Benefits of DVL over other offline RL methods

✅ Fixes the instability of Extreme Q Learning (XQL)
✅ Directly models V* in continuous action spaces
✅ Implict, no OOD Sampling or actor-critic formulation
✅ Conservative with respect to the induced behavior policy distribution
✅ Improves performance on the D4RL benchmark versus similar approaches

Citation

@misc{sikchi2023dual,
      title={Dual RL: Unification and New Methods for Reinforcement and Imitation Learning}, 
      author={Harshit Sikchi and Qinqing Zheng and Amy Zhang and Scott Niekum},
      year={2023},
      eprint={2302.08560},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Questions

Please feel free to email us if you have any questions.

Harshit Sikchi ([email protected])

Acknowledgement

This repository builds heavily on the XQL(https://github.com/Div99/xql) and IQL(https://github.com/ikostrikov/implicit_q_learning) codebases. Please make sure to cite them as well when using this code.

About

A Dual-RL method DVL: Dual-V Learning for offline and online reinforcement learning

Topics

Resources

Stars

Watchers

Forks

Languages