forked from opendilab/DI-engine
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
325 lines (313 loc) · 14.2 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
2022.09.23(v0.4.3)
- env: add rule-based gomoku expert (#465)
- algo: fix a2c policy batch size bug (#481)
- algo: enable activation option in collaq attention and mixer
- algo: minor fix about IBC (#477)
- feature: add IGM support (#486)
- feature: add tb logger middleware and demo
- fix: the type conversion in ding_env_wrapper (#483)
- fix: di-orchestrator version bug in unittest (#479)
- fix: data collection errors caused by shallow copies (#475)
- fix: gym==0.26.0 seed args bug
- style: add readme tutorial link(environment & algorithm) (#490) (#493)
- style: adjust location of the default_model method in policy (#453)
2022.09.08(v0.4.2)
- env: add rocket env (#449)
- env: updated pettingzoo env and improved related performance (#457)
- env: add mario env demo (#443)
- env: add MAPPO multi-agent config (#464)
- env: add mountain car (discrete action) environment (#452)
- env: fix multi-agent mujoco gym comaptibility bug
- env: fix gfootball env save_replay variable init bug
- algo: add IBC (Implicit Behaviour Cloning) algorithm (#401)
- algo: add BCO (Behaviour Cloning from Observation) algorithm (#270)
- algo: add continuous PPOPG algorithm (#414)
- algo: add PER in CollaQ (#472)
- algo: add activation option in QMIX and CollaQ
- feature: update ctx to dataclass (#467)
- fix: base_env FinalMeta bug about gym 0.25.0-0.25.1
- fix: config inplace modification bug
- fix: ding cli no argument problem
- fix: import errors after running setup.py (jinja2, markupsafe)
- fix: conda py3.6 and cross platform build bug
- style: add project state and datetime in log dir (#455)
- style: polish notes for q-learning model (#427)
- style: revision to mujoco dockerfile and validation (#474)
- style: add dockerfile for cityflow env
- style: polish default output log format
2022.08.12(v0.4.1)
- env: add gym trading env (#424)
- env: add board games env (tictactoe, gomuku, chess) (#356)
- env: add sokoban env (#397) (#429)
- env: add BC and DQN demo for gfootball (#418) (#423)
- env: add discrete pendulum env (#395)
- algo: add STEVE model-based algorithm (#363)
- algo: add PLR algorithm (#408)
- algo: plugin ST-DIM in PPO (#379)
- feature: add final result saving in training pipeline
- fix: random policy randomness bug
- fix: action_space seed compalbility bug
- fix: discard message sent by self in redis mq (#354)
- fix: remove pace controller (#400)
- fix: import error in serial_pipeline_trex (#410)
- fix: unittest hang and fail bug (#413)
- fix: DREX collect data unittest bug
- fix: remove unused import cv2
- fix: ding CLI env/policy option bug
- style: upgrade Python version from 3.6-3.8 to 3.7-3.9
- style: upgrade gym version from 0.20.0 to 0.25.0
- style: upgrade torch version from 1.10.0 to 1.12.0
- style: upgrade mujoco bin from 2.0.0 to 2.1.0
- style: add buffer api description (#371)
- style: polish VAE comments (#404)
- style: unittest for FQF (#412)
- style: add metaworld dockerfile (#432)
- style: remove opencv requirement in default setting
- style: update long description in setup.py
2022.06.21(v0.4.0)
- env: add MAPPO/MASAC all configs in SMAC (#310) **(SOTA results in SMAC!!!)**
- env: add dmc2gym env (#344) (#360)
- env: remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
- env: add latest GAIL mujoco config (#298)
- env: polish procgen env (#311)
- env: add MBPO ant and humanoid config for mbpo (#314)
- env: fix slime volley env obs space bug when agent_vs_agent
- env: fix smac env obs space bug
- env: fix import path error in lunarlander (#362)
- algo: add Decision Transformer algorithm (#327) (#364)
- algo: add on-policy PPG algorithm (#312)
- algo: add DDPPO & add model-based SAC with lambda-return algorithm (#332)
- algo: add infoNCE loss and ST-DIM algorithm (#326)
- algo: add FQF distributional RL algorithm (#274)
- algo: add continuous BC algorithm (#318)
- algo: add pure policy gradient PPO algorithm (#382)
- algo: add SQIL + SAC algorithm (#348)
- algo: polish NGU and related modules (#283) (#343) (#353)
- algo: add marl distributional td loss (#331)
- feature: add new worker middleware (#236)
- feature: refactor model-based RL pipeline (ding/world_model) (#332)
- feature: refactor logging system in the whole DI-engine (#316)
- feature: add env supervisor design (#330)
- feature: support async reset for envpool env manager (#250)
- feature: add log videos to tensorboard (#320)
- feature: refactor impala cnn encoder interface (#378)
- fix: env save replay bug
- fix: transformer mask inplace operation bug
- fix: transtion_with_policy_data bug in SAC and PPG
- style: add dockerfile for ding:hpc image (#337)
- style: fix mpire 2.3.5 which handles default processes more elegantly (#306)
- style: use FORMAT_DIR instead of ./ding (#309)
- style: update quickstart colab link (#347)
- style: polish comments in ding/model/common (#315)
- style: update mujoco docker download path (#386)
- style: fix protobuf new version compatibility bug
- style: fix torch1.8.0 torch.div compatibility bug
- style: update doc links in readme
- style: add outline in readme and update wechat image
- style: update head image and refactor docker dir
2022.04.23(v0.3.1)
- env: polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
- env: add GRF academic env and config (#281)
- env: update env inferface of GRF (#258)
- env: update D4RL offline RL env and config (#285)
- env: polish PomdpAtariEnv (#254)
- algo: DREX algorithm (#218)
- feature: separate mq and parallel modules, add redis (#247)
- feature: rename env variables; fix attach_to parameter (#244)
- feature: env implementation check (#275)
- feature: adjust and set the max column number of tabulate in log (#296)
- feature: add drop_extra option for sample collect
- feature: speed up GTrXL forward method + GRU unittest (#253) (#292)
- fix: add act_scale in DingEnvWrapper; fix envpool env manager (#245)
- fix: auto_reset=False and env_ref bug in env manager (#248)
- fix: data type and deepcopy bug in RND (#288)
- fix: share_memory bug and multi_mujoco env (#279)
- fix: some bugs in GTrXL (#276)
- fix: update gym_vector_env_manager and add more unittest (#241)
- fix: mdpolicy random collect bug (#293)
- fix: gym.wrapper save video replay bug
- fix: collect abnormal step format bug and add unittest
- test: add buffer benchmark & socket test (#284)
- style: upgrade mpire (#251)
- style: add GRF(google research football) docker (#256)
- style: update policy and gail comment
2022.03.24(v0.3.0)
- env: add bitfilp HER DQN benchmark (#192) (#193) (#197)
- env: slime volley league training demo (#229)
- algo: Gated TransformXL (GTrXL) algorithm (#136)
- algo: TD3 + VAE(HyAR) latent action algorithm (#152)
- algo: stochastic dueling network (#234)
- algo: use log prob instead of using prob in ACER (#186)
- feature: support envpool env manager (#228)
- feature: add league main and other improvements in new framework (#177) (#214)
- feature: add pace controller middleware in new framework (#198)
- feature: add auto recover option in new framework (#242)
- feature: add k8s parser in new framework (#243)
- feature: support async event handler and logger (#213)
- feautre: add grad norm calculator (#205)
- feautre: add gym vector env manager (#147)
- feautre: add train_iter and env_step in serial pipeline (#212)
- feautre: add rich logger handler (#219) (#223) (#232)
- feature: add naive lr_scheduler demo
- refactor: new BaseEnv and DingEnvWrapper (#171) (#231) (#240)
- polish: MAPPO and MASAC smac config (#209) (#239)
- polish: QMIX smac config (#175)
- polish: R2D2 atari config (#181)
- polish: A2C atari config (#189)
- polish: GAIL box2d and mujoco config (#188)
- polish: ACER atari config (#180)
- polish: SQIL atari config (#230)
- polish: TREX atari/mujoco config
- polish: IMPALA atari config
- polish: MBPO/D4PG mujoco config
- fix: random_collect compatible to episode collector (#190)
- fix: remove default n_sample/n_episode value in policy config (#185)
- fix: PDQN model bug on gpu device (#220)
- fix: TREX algorithm CLI bug (#182)
- fix: DQfD JE computation bug and move to AdamW optimizer (#191)
- fix: pytest problem for parallel middleware (#211)
- fix: mujoco numpy compatibility bug
- fix: markupsafe 2.1.0 bug
- fix: framework parallel module network emit bug
- fix: mpire bug and disable algotest in py3.8
- fix: lunarlander env import and env_id bug
- fix: icm unittest repeat name bug
- fix: buffer thruput close bug
- test: resnet unittest (#199)
- test: SAC/SQN unittest (#207)
- test: CQL/R2D3/GAIL unittest (#201)
- test: NGU td unittest (#210)
- test: model wrapper unittest (#215)
- test: MAQAC model unittest (#226)
- style: add doc docker (#221)
2022.01.01(v0.2.3)
- env: add multi-agent mujoco env (#146)
- env: add delay reward mujoco env (#145)
- env: fix port conflict in gym_soccer (#139)
- algo: MASAC algorithm (#112)
- algo: TREX algorithm (#119) (#144)
- algo: H-PPO hybrid action space algorithm (#140)
- algo: residual link in R2D2 (#150)
- algo: gumbel softmax (#169)
- algo: move actor_head_type to action_space field
- feature: new main pipeline and async/parallel framework (#142) (#166) (#168)
- feature: refactor buffer, separate algorithm and storage (#129)
- feature: cli in new pipeline(ditask) (#160)
- feature: add multiprocess tblogger, fix circular reference problem (#156)
- feature: add multiple seed cli
- feature: polish eps_greedy_multinomial_sample in model_wrapper (#154)
- fix: R2D3 abs priority problem (#158) (#161)
- fix: multi-discrete action space policies random action bug (#167)
- fix: doc generate bug with enum_tools (#155)
- style: more comments about R2D2 (#149)
- style: add doc about how to migrate a new env
- style: add doc about env tutorial in dizoo
- style: add conda auto release (#148)
- style: udpate zh doc link
- style: update kaggle tutorial link
2021.12.03(v0.2.2)
- env: apple key to door treasure env (#128)
- env: add bsuite memory benchmark (#138)
- env: polish atari impala config
- algo: Guided Cost IRL algorithm (#57)
- algo: ICM exploration algorithm (#41)
- algo: MP-DQN hybrid action space algorithm (#131)
- algo: add loss statistics and polish r2d3 pong config (#126)
- feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134)
- fix: async subprocess env manager reset bug (#137)
- fix: keepdims name bug in model wrapper
- fix: on-policy ppo value norm bug
- fix: GAE and RND unittest bug
- fix: hidden state wrapper h tensor compatiblity
- fix: naive buffer auto config create bug
- style: add supporters list
2021.11.22(v0.2.1)
- env: gym-hybrid env (#86)
- env: gym-soccer (HFO) env (#94)
- env: Go-Bigger env baseline (#95)
- env: add the bipedalwalker config of sac and ppo (#121)
- algo: DQfD Imitation Learning algorithm (#48) (#98)
- algo: TD3BC offline RL algorithm (#88)
- algo: MBPO model-based RL algorithm (#113)
- algo: PADDPG hybrid action space algorithm (#109)
- algo: PDQN hybrid action space algorithm (#118)
- algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40)
- algo: self-play training demo in slime_volley env (#23)
- algo: add example of GAIL entry + config for mujoco (#114)
- feature: enable arbitrary policy num in serial sample collector
- feautre: add torch DataParallel for single machine multi-GPU
- feature: add registry force_overwrite argument
- feature: add naive buffer periodic thruput seconds argument
- test: add pure docker setting test (#103)
- test: add unittest for dataset and evaluator (#107)
- test: add unittest for on-policy algorithm (#92)
- test: add unittest for ppo and td (MARL case) (#89)
- test: polish collector benchmark test
- fix: target model wrapper hard reset bug
- fix: fix learn state_dict target model bug
- fix: ppo bugs and update atari ppo offpolicy config (#108)
- fix: pyyaml version bug (#99)
- fix: small fix on bsuite environment (#117)
- fix: discrete cql unittest bug
- fix: release workflow bug
- fix: base policy model state_dict overlap bug
- fix: remove on_policy option in dizoo config and entry
- fix: remove torch in env
- style: gym version > 0.20.0
- style: torch version >= 1.1.0, <= 1.10.0
- style: ale-py == 0.7.0
2021.9.30(v0.2.0)
- env: overcooked env (#20)
- env: procgen env (#26)
- env: modified predator env (#30)
- env: d4rl env (#37)
- env: imagenet dataset (#27)
- env: bsuite env (#58)
- env: move atari_py to ale-py
- algo: SQIL algorithm (#25) (#44)
- algo: CQL algorithm (discrete/continuous) (#37) (#68)
- algo: MAPPO algorithm (#62)
- algo: WQMIX algorithm (#24)
- algo: D4PG algorithm (#76)
- algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72)
- feature: image classification training pipeline (#27)
- feature: add force_reproducibility option in subprocess env manager
- feature: add/delete/restart replicas via cli for k8s
- feautre: add league metric (trueskill and elo) (#22)
- feature: add tb in naive buffer and modify tb in advanced buffer (#39)
- feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
- feature: add hyper-parameter scheduler module (#38)
- feautre: add plot function (#59)
- fix: acer bug and update atari result (#21)
- fix: mappo nan bug and dict obs cannot unsqueeze bug (#54)
- fix: r2d2 hidden state and obs arange bug (#36) (#52)
- fix: ppo bug when use dual_clip and adv > 0
- fix: qmix double_q hidden state bug
- fix: spawn context problem in interaction unittest (#69)
- fix: formatted config no eval bug (#53)
- fix: the catch statments that will never succeed and system proxy bug (#71) (#79)
- fix: lunarlander config
- fix: c51 head dimension mismatch bug
- fix: mujoco config typo bug
- fix: ppg atari config bug
- fix: max use and priority update special branch bug in advanced_buffer
- style: add docker deploy in github workflow (#70) (#78) (#80)
- style: support PyTorch 1.9.0
- style: add algo/env list in README
- style: rename advanced_buffer register name to advanced
2021.8.3(v0.1.1)
- env: selfplay/league demo (#12)
- env: pybullet env (#16)
- env: minigrid env (#13)
- env: atari enduro config (#11)
- algo: on policy PPO (#9)
- algo: ACER algorithm (#14)
- feature: polish experiment directory structure (#10)
- refactor: split doc to new repo (#4)
- fix: atari env info action space bug
- fix: env manager retry wrapper raise exception info bug
- fix: dist entry disable-flask-log typo
- style: codestyle optimization by lgtm (#7)
- style: code/comment statistics badge
- style: github CI workflow
2021.7.8(v0.1.0)