Cross Entropy Method

Cross Entropy Method (CEM) is a gradient free optimization algorithm that fits parameters by iteratively resampling from an elite population.

The model learns only from a single scalar (total episode reward).

Pseudocode for the CEM algorithm:

for epoch in num_epochs:
  sampling a population from a distribution
  testing that population using the environment
  selecting the elites (judged by total episode reward)
  refitting the sampling distribution (to the elites)

CEM can be easily parallelized - this implementation runs batches across multiple processes using Python's multiprocessing, making it quick in wall time.

The total number of episodes run in an experiment is given by:

num_episodes = num_epochs * num_processes * batch_size

Use

Cartpole

$ python cem.py cartpole --num_process 6 --epochs 8 --batch_size 4096
Namespace(env='cartpole', num_process=6, epochs=8, batch_size=4096)
expt of 196608 total episodes
epoch 0 - 22.0 30.5 pop - 64.9 48.1 elites
epoch 1 - 33.3 37.4 pop - 92.9 46.2 elites
epoch 2 - 46.0 46.9 pop - 125.1 46.9 elites

Pendulum

$ python cem.py pendulum --num_process 6 --epochs 15 --batch_size 4096

Setup

The dependencies of this project are gym and matplotlib - numpy will come along with gym:

$ pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
cem.py		cem.py
envs.py		envs.py
plotting.py		plotting.py
policies.py		policies.py
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross Entropy Method

Use

Cartpole

Pendulum

Setup

About

Releases

Packages

Languages

ADGEfficiency/cem

Folders and files

Latest commit

History

Repository files navigation

Cross Entropy Method

Use

Cartpole

Pendulum

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages