diff --git a/Alessandro_Pomponio_Bowling_PPO.html b/Alessandro_Pomponio_Bowling_PPO.html deleted file mode 100644 index 1490549..0000000 --- a/Alessandro_Pomponio_Bowling_PPO.html +++ /dev/null @@ -1,17722 +0,0 @@ - - -
- - -This notebook has been created by Alessandro Pomponio for the Autonomous and Adaptive System course, held by prof. Mirco Musolesi at the University of Bologna.
- -Since this notebook was written on Google Colab, we need to install some prerequisites if we want to render outputs.
- -!(apt update && apt install xvfb ffmpeg python-opengl -y) > /dev/null 2>&1
-!pip install gym-notebook-wrapper > /dev/null 2>&1
-!pip install opencv-python > /dev/null 2>&1
-!pip install gym[atari] > /dev/null 2>&1
-
import tensorflow as tf
-import numpy as np
-import cv2 as cv
-import gnwrapper
-import time
-import gym
-
-from tensorflow.keras import backend as K
-from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense
-from tensorflow.keras.losses import MeanSquaredError
-from tensorflow.keras.models import Sequential
-from tensorflow.keras.optimizers import Adam
-from tensorflow.keras import Model
-from tensorflow import keras
-from scipy.stats import entropy
-
We start by taking a look at how the Bowling
environment is set up and implemented in OpenAI Gym.
References:
- - -ENVIRONMENT='Bowling-v0'
-env = gym.make(ENVIRONMENT)
-
The observation space is given by the video feed, an 8-bit 210x160 RGB image (with values in the range 0-255)
- -print(env.observation_space)
-
Box(0, 255, (210, 160, 3), uint8) --
We can perform 6 actions:
- -print(env.action_space)
-
Discrete(6) --
The actions have the following meanings:
- -env.unwrapped.get_action_meanings()
-
['NOOP', 'FIRE', 'UP', 'DOWN', 'UPFIRE', 'DOWNFIRE']-
Reading the report "Game Playing with Deep Q-Learning using OpenAI Gym" by Robert Chuchro and Deepak Gupta we find a few suggestions that can help us in preprocessing the observations provided by the environment.
-From the report:
-In addition to what has been done in the paper, we can try to crop the observation to include only the bowling lane. This might be able to further optimize the training, making the network only focus on what is important.
-We will implement all these preprocessing actions as gym.ObservationWrapper
, following the templates shown in Alexander Van de Kleut's blog and in the official OpenAI Gym repository.
We will start by cropping the observation, as we must figure out what to do on our own.
-To make this process more "visual", we will import matplotlib and use it to show the observation in the notebook. We will render the environment as a rgb_array
to allow for plotting.
%matplotlib inline
-import matplotlib.pyplot as plt
-env.reset()
-obs = env.render(mode = "rgb_array")
-plt.imshow(obs)
-
<matplotlib.image.AxesImage at 0x7ffb7d0b3750>-
A first look at the picture tells us that the bowling lane starts a little after 100 pixels and ends a little before 175 pixels (vertically). After some testing (running the game and rendering frames), we are able to crop the observation vertically with the following values:
- -vertical_crop_start = 105 # @param{type:"integer"}
-vertical_crop_end = 170 # @param{type:"integer"}
-horizontal_crop_start = 0 # @param{type:"integer"}
-horizontal_crop_end = 160 # @param{type:"integer"}
-
cropped_obs = obs[vertical_crop_start:vertical_crop_end, horizontal_crop_start:horizontal_crop_end]
-plt.imshow(cropped_obs)
-
<matplotlib.image.AxesImage at 0x7ffb7cba32d0>-
This has reduced the observation from (210, 160, 3) to (65, 160, 3), decreasing the screen size by ~70%.
- -cropped_obs.shape
-
(65, 160, 3)-
We then define our wrapper:
- -class CropObservation(gym.ObservationWrapper):
-
- def __init__(self, env):
- super().__init__(env)
-
- def observation(self, obs):
- return obs[vertical_crop_start:vertical_crop_end,\
- horizontal_crop_start:horizontal_crop_end]
-
This functionality is part of the predefined wrappers in OpenAI Gym. We will then use the GreyScaleObservation
wrapper, which can be found here: https://github.com/openai/gym/blob/master/gym/wrappers/gray_scale_observation.py
This type of denoising may or may not be useful in our case, as the environment does not have any type of background noise. We will implement it as a wrapper anyway, using OpenCV's function: https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
- -class DenoiseObservation(gym.ObservationWrapper):
-
- def __init__(self, env):
- super().__init__(env)
-
- def observation(self, obs):
- return cv.adaptiveThreshold(obs, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 3, 2)
-
We know that the state is represented by 8-bit values, from 0 to 255. To normalize it, we simply divide the observation by 255.
- -class NormalizeObservation(gym.ObservationWrapper):
-
- def __init__(self, env):
- super().__init__(env)
-
- def observation(self, obs):
- return np.divide(obs, 255.)
-
In the original paper, the authors shrunk the observation from the original size down to 80x80. Since we can crop the observation by 70%, we might want to further reduce this size.
-Again, this feature is pre-packed in OpenAI Gym, in the ResizeObservation
wrapper, which can be found here: https://github.com/openai/gym/blob/master/gym/wrappers/resize_observation.py
To stack frames, we will use another one of the provided wrappers in Gym, FrameStack
, which can be found here: https://github.com/openai/gym/blob/master/gym/wrappers/frame_stack.py
For ease of use, we condense here all the preprocessing settings:
- -# Cropping
-use_cropping = True # @param {type:"boolean"}
-
-# Greyscale
-use_greyscale = True # @param {type:"boolean"}
-
-# Denoiser
-use_denoiser = True # @param {type:"boolean"}
-
-# Frame stacking
-stack_frames = True # @param {type:"boolean"}
-frame_stack_size = 4 # @param {type:"integer"}
-
-# Normalization
-use_normalization = True # @param {type:"boolean"}
-
-# Resizing
-resize_observation = True # @param {type:"boolean"}
-observation_side_length = 80 # @param {type:"integer"}
-
Finally, let us look at the results of our preprocessing:
- -cropped_obs = obs[vertical_crop_start:vertical_crop_end, horizontal_crop_start:horizontal_crop_end]
-cropped_obs = cv.resize(cropped_obs, (80,80), interpolation = cv.INTER_AREA)
-cropped_obs = cv.cvtColor(cropped_obs, cv.COLOR_RGB2GRAY)
-cropped_obs = cv.adaptiveThreshold(cropped_obs, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 3, 2)
-cropped_obs = cropped_obs / 255.
-plt.imshow(cropped_obs)
-
<matplotlib.image.AxesImage at 0x7ffb7cb1e650>-
Note that in the end we ended up keeping the image resolution at 80x80. This was done because at lower resolutions (such as 50x50), pin and ball dimensions and shapes were being heavily altered, as we can see in the picture below:
- -cropped_obs = obs[vertical_crop_start:vertical_crop_end, horizontal_crop_start:horizontal_crop_end]
-cropped_obs = cv.resize(cropped_obs, (50, 50), interpolation = cv.INTER_AREA)
-cropped_obs = cv.cvtColor(cropped_obs, cv.COLOR_RGB2GRAY)
-cropped_obs = cv.adaptiveThreshold(cropped_obs, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 3, 2)
-cropped_obs = cropped_obs / 255.
-plt.imshow(cropped_obs)
-
<matplotlib.image.AxesImage at 0x7ffb7ca86dd0>-
We also create a utility function to create the environment given the settings above. They have been re-ordered to make sure everything works as expected.
- -def get_environment():
-
- env = gym.make(ENVIRONMENT)
-
- if use_cropping:
- env = CropObservation(env)
- if use_greyscale:
- env = gym.wrappers.GrayScaleObservation(env)
- if use_denoiser:
- env = DenoiseObservation(env)
- if use_normalization:
- env = NormalizeObservation(env)
- if resize_observation:
- env = gym.wrappers.ResizeObservation(env, (observation_side_length, observation_side_length))
- if stack_frames and frame_stack_size > 0:
- env = gym.wrappers.FrameStack(env, frame_stack_size)
-
- return env
-
From "Proximal Policy Optimization Algorithms" by Schulman et al:
-[Proximal Policy Optimization algorithms are] a new family of policy gradient methods [...] which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. [They] have some of the benefits of trust region policy optimization (TRPO) but they are much simpler to implement, more general, and have better sample complexity (empirically).
-This family of algorithms has been proposed due to the lack of simple, efficient, scalable and robust policy gradient algorithms for reinforcement learning. At that time, in fact, the most common algorithms used were Q-learning (which struggled in problems with continuous action spaces, and was generally poorly understood) and Trust Region Policy Optimization (which is complicated and not compatible with architectures that include noise or parameter sharing).
-PPO attempts to achieve the same data efficiency and reliable performance of TRPO, while using only first-order optimization. This is achieved by means of a novel objective with clipped probability ratios, which forms a pessimistic estimate (i.e., a lower bound) of the performance of the policy. Policy optimization is obtained by alternating between sampling data from the policy and performing several epochs of optimization on the sampled data.
- -We will use the hyperparameters suggested for Atari games in the paper. Note that the Adam stepize and the clipping parameter $\epsilon$ are multiplied by a value $\alpha$, which is linearly annealed from 1 to 0 over the course of learning.
-In addition, for the sake of simplicity, we will write a PPO implementation that uses only one actor (the paper uses 8 in parallel in Atari experiments)
- -horizon = 128 # @param{type:"integer"}
-adam_stepsize = 2.5e-4 # @param{type:"number"}
-num_epochs = 3 # @param{type:"integer"}
-minibatch_size = 32 # @param{type:"integer"}
-gamma = 0.99 # @param{type:"number"}
-gae_lambda = 0.95 # @param{type:"number"}
-clipping_parameter = 0.1 # @param{type:"number"}
-c1_coeff = 1 # @param{type:"integer"}
-c2_coeff = 0.01 # @param{type:"number"}
-
-# We define epsilon here so we can use it as a global variable and
-# modify it later on by multiplying it by alpha
-epsilon = clipping_parameter
-
In order to limit the size of the policy update, TRPO applies a constraint $\beta$ to a "surrogate" objective function based on the KL-divercence between the old and the new policy.
-In theory TRPO should use a penalty instead of a hard constraint; choosing a value of $\beta$ that performs well across different problems, however, is very difficult. The authors of PPO, then took TRPO's surrogate objective:
-$L^{CPI}(\theta) = \hat{\mathbb{E}}_t \Big[\frac{\pi_{\theta}(a_t \vert s_t)}{\pi_{\theta_{old}}(a_t \vert s_t)} \hat{A}_t \Big] = \hat{\mathbb{E}}_t \Big[ r_t(\theta) \hat{A}_t \Big]$
-and modified it to penalize changes to the policy that move $r_t (\theta)$ away from 1 (as it would lead to an excessively large policy update without constraints). They propose the following:
-$L^{CLIP}(\theta)=\hat{\mathbb{E}}_t \bigg[\min\left(r_t(\theta)\hat{A}_t, clip\left(r_t(\theta), 1-\epsilon, 1+\epsilon \right) \hat{A}_t\right)\bigg]$
-This new objective clips the probability ratio to remove the incentive for moving $r_t$ outside the interval around 1 delimited by $\epsilon$ and returns the minimum between the unclipped and the clipped objective, effectively acting as a lower bound. In practice, this leads to ignoring the change in probability ratio when it would make the objective improve, only including it when it makes it worse.
- -def clipped_surrogate_objective(probability_ratio, advantages):
- unclipped_objectives = probability_ratio * advantages
- clipped_objectives = K.clip(probability_ratio, 1 - epsilon, 1 + epsilon) * advantages
- lower_bounds = K.min([unclipped_objectives, clipped_objectives], axis = 0)
- return K.mean(lower_bounds)
-
The term $\hat{A}_t$ that we saw in the previous formulas is an advantage estimator, a quantity showing how good or bad something is compared to our current estimates.
-The authors of this paper propose a truncated version of generalized advantage estimation, adapting what was proposed in "Asynchronous methods for deep reinforcement learning" by Mnih et al.
-$\hat{A}_t = \delta_t + (\gamma\lambda)\delta_{t+1} + \dots + \dots + (\gamma\lambda)^{T-t+1}\delta_{T-1}$
-where $\delta_t = r_t + \gamma V(s_{t+1}) - V(s_t)$
-Note: the formula above does not explicitly put a check on whether the state at $t+1$ is a terminal state or not.
- -def compute_A_t(rewards, values, dones, start, T):
-
- A_t = rewards[start] - values[start]
- coefficient = 1
-
- for t in range(start + 1, T - 1):
- delta = rewards[t] + (gamma * (1 - dones[t + 1]) * values[t + 1]) - values[t]
- A_t += coefficient * delta
- coefficient = (gamma * gae_lambda)**(T - t + 1)
-
- return A_t
-
-def compute_advantage_estimates(rewards, values, dones, T):
-
- advantage_estimates = []
-
- for t in range(T):
- A_t = compute_A_t(rewards, values, dones, t, T)
- advantage_estimates.append(A_t)
-
- return np.array(advantage_estimates)
-
Since we will have to compute the discounted returns as well, we will create a function for it, too:
- -def compute_discounted_returns(rewards, dones, T):
-
- returns = []
- discounted_sum = 0
-
- for t in reversed(range(T)):
- discounted_sum = (1 - dones[t]) * rewards[t] + gamma * discounted_sum
- returns.insert(0, discounted_sum)
-
- return returns
-
We will be using a neural network architecture that shares parameters between the policy and the value function; this requires us to use a loss function that combines the policy surrogate and a value function error term. This objective can further be augmented by adding an entropy bonus to ensure sufficient exploration.
-Combining these terms, we obtain the following objective, which is (approximately) maximized each iteration:
-$L_t^{CLIP+VF+S}(\theta) = \hat{\mathbb{E}} \big[L_t^{CLIP}(\theta) - c_1 L_t^{VF}(\theta) + c_2 S[\pi_{\theta}](s_t) \big]$
-_where $c_1$, $c_2$ are coefficients, and $S$ denotes an entropy bonus, and $L_t^{VF}$ is a squared-error loss $\left( V_{\theta}(s_t) - V_t^{targ} \right)^2$._
- -def compute_total_loss(actor_loss, critic_loss, entropy_bonus):
- total_loss = actor_loss - (c1_coeff * critic_loss) + (c2_coeff * entropy_bonus)
- return K.mean(total_loss)
-
The authors provide a pseudocode implementation of PPO with fixed-length trajectory segments:
- -for iteration = 1, 2, ... do
- for actor = 1, 2, ..., N do
- Run policy \pi_{\theta_{old}} in environment for T timesteps
- Compute advantage estimates \hat{A}_1, ..., \hat{A}_T
- end for
- Optimize surrogate L wrt \theta, with K epochs and minibatch size M \le NT
- \theta_{old} <- \theta
-end for
-We will first define the neural network we will use as function approximator and a few utility functions to run part of the algorithm.
- -We set the network parameters according to our configuration.
- -screen_height = 210
-screen_width = 160
-color_channels = 3
-
-if use_cropping:
- screen_height = 165
- screen_width = 120
-if use_greyscale:
- color_channels = 1
-if resize_observation:
- screen_height = observation_side_length
- screen_width = observation_side_length
-
-if stack_frames and frame_stack_size > 0:
- input_shape = (frame_stack_size, screen_height, screen_width, color_channels)
-else:
- input_shape = (screen_height, screen_width, color_channels)
-
-actions_available = env.action_space.n
-
We define a utility function to get a new network for ease of use.
- -def get_network():
-
- # Input layer
- inputs = Input(shape=input_shape)
-
- # Convolutions on the frames on the screen
- layer1 = Conv2D(16, 8, strides = 4, activation = "relu")(inputs)
- layer2 = Conv2D(32, 4, strides = 2, activation = "relu")(layer1)
-
- # Flatten and add densely connected layer
- layer3 = Flatten()(layer2)
- layer4 = Dense(256, activation = "relu")(layer3)
-
- # Actor and critic layers
- actor = Dense(actions_available, activation = "softmax")(layer4)
- critic = Dense(1, activation = "linear")(layer4)
-
- return Model(inputs = inputs, outputs = [actor, critic])
-
As we said before, PPO runs the policy for $T$ timesteps (where $T$ is much less than the episode length) and uses the collected samples for an update.
-Here we define a utility function to play for $T$ timesteps:
- -# The state is required in input because we must be able to resume the episode
-# from where we left off.
-def play_for_timestamps(env, state, model):
-
- states = []
- actions = []
- action_probabilities = []
- critic_values = []
- rewards = []
- dones = []
-
- # The timesteps for which we play is part of our hyperparameters
- # and is represented by the horizon
- for step in range(horizon):
-
- # We must call expand_dims to have the correct input size for the network
- t_state = tf.expand_dims(state, axis = 0)
- actor_value, critic_value = model(t_state)
-
- # Normally the action distribution would be (1,6)
- # By squeezing we remove "one-dimensional dimensions"
- # so we get (6,)
- # https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html
- action_probability = np.squeeze(actor_value)
- critic_value = np.squeeze(critic_value)
-
- action = np.random.choice(actions_available, p = action_probability)
- next_state, reward, done, _ = env.step(action)
-
- states.append(state)
- actions.append(action)
- action_probabilities.append(action_probability)
- critic_values.append(critic_value)
- rewards.append(reward)
- dones.append(done)
-
- if done:
- state = env.reset()
- else:
- state = next_state
-
- return states, actions, action_probabilities, critic_values, rewards, dones
-
The PPO update is run on minibatches of size minibatch_size
, we create a generator function to output them starting from the data we collected.
def generate_minibatches(states, actions, action_probabilities, critic_values, rewards, dones):
-
- states = np.array(states)
- actions = np.array(actions)
- action_probabilities = np.array(action_probabilities)
- critic_values = np.array(critic_values)
- rewards = np.array(rewards)
- dones = np.array(dones)
-
- number_of_batches = horizon // minibatch_size
- for batch in range(number_of_batches):
- batch_start = batch * minibatch_size
- batch_end = (batch + 1) * minibatch_size
- indices = np.arange(batch_start, batch_end)
- yield states[indices], actions[indices], action_probabilities[indices],\
- critic_values[indices], rewards[indices], dones[indices]
-
The PPO update is run on the data obtained while playing and is repeated for num_epochs
.
def perform_ppo_update(model, states, actions, action_probabilities, critic_values, rewards, dones):
- for epoch in range(num_epochs):
- for states_batch, actions_batch, action_probabilities_batch,\
- critic_values_batch, rewards_batch, dones_batch in \
- generate_minibatches(states, actions, action_probabilities,\
- critic_values, rewards, dones):
- with tf.GradientTape() as tape:
-
- # We need the action probabilities output by the current network
- # for our probability ratios. This is because we repeat this training
- # phase for `num_epochs` and the probabilities will change.
- current_action_probabilities, current_critic_values = model(states_batch)
-
- # We want the probability ratios only for the actions that we took
- # I could't find a better way to do this via numpy, so I had to resort
- # to this abomination of a list comprehension
- probability_ratios = current_action_probabilities / action_probabilities_batch
- pr = [prob[actions_batch[i]] for i, prob in enumerate(probability_ratios)]
-
- advantages = compute_advantage_estimates(rewards_batch, critic_values_batch, dones_batch, minibatch_size)
- actor_loss = clipped_surrogate_objective(pr, advantages)
-
- # We use `expand_dims` in order to have the correct dimensions in
- # output. If we were to call `squeeze` on the critic values we would
- # end up with the "missing gradients" error.
- returns = compute_discounted_returns(rewards_batch, dones_batch, minibatch_size)
- returns = np.expand_dims(returns, axis = 1)
- critic_loss = K.mean(K.square(current_critic_values - returns))
-
- # Calculate entropy values and bonus
- entropy_values = [entropy(probabilities, axis=0) for probabilities in current_action_probabilities]
- entropy_bonus = K.mean(tf.convert_to_tensor(entropy_values, dtype=np.float32))
-
- # We negate the total loss since we want to perform gradient ascent
- total_loss = compute_total_loss(actor_loss, critic_loss, entropy_bonus)
- total_loss = -total_loss
-
- gradients = tape.gradient(total_loss, model.trainable_variables)
- optimizer.apply_gradients(zip(gradients, model.trainable_variables))
-
After implementing the PPO algorithm, we must now instantiate the network and run the training loop.
- -We initially set a reward threshold of 50 because it served as a measure of the algorithm working. It is also a value near what the paper obtained before the agent started having a few issues and undoing what it had learned until then.
- -max_frames = 40e6 # @param{type:"number"}
-history_length = 100 # @param{type:"integer"}
-
-# Stop after reward threshold
-stop_when_reward_above_threshold = True # @param{type:"boolean"}
-reward_threshold = 50 # @param{type:"integer"}
-
-# Stop after a set time
-limit_training_time = False # @param{type:"boolean"}
-max_training_time = 28800 # @param{type:"integer"}
-
get_network().summary()
-
Model: "model" -__________________________________________________________________________________________________ -Layer (type) Output Shape Param # Connected to -================================================================================================== -input_1 (InputLayer) [(None, 4, 80, 80, 1 0 -__________________________________________________________________________________________________ -conv2d (Conv2D) (None, 4, 19, 19, 16 1040 input_1[0][0] -__________________________________________________________________________________________________ -conv2d_1 (Conv2D) (None, 4, 8, 8, 32) 8224 conv2d[0][0] -__________________________________________________________________________________________________ -flatten (Flatten) (None, 8192) 0 conv2d_1[0][0] -__________________________________________________________________________________________________ -dense (Dense) (None, 256) 2097408 flatten[0][0] -__________________________________________________________________________________________________ -dense_1 (Dense) (None, 6) 1542 dense[0][0] -__________________________________________________________________________________________________ -dense_2 (Dense) (None, 1) 257 dense[0][0] -================================================================================================== -Total params: 2,108,471 -Trainable params: 2,108,471 -Non-trainable params: 0 -__________________________________________________________________________________________________ --
%%time
-model = get_network()
-env = get_environment()
-recent_rewards = np.zeros(history_length)
-
-total_frames = 0
-current_episode_reward = 0
-episodes_solved = 0
-target_reached = False
-start_time = time.time()
-
-
-state = env.reset()
-try:
- while total_frames < max_frames:
-
- if stop_when_reward_above_threshold and target_reached:
- print("+++++++++++++ TRAINING STOPPED: WE REACHED THE REWARD THRESHOLD +++++++++++++")
- break
-
- if limit_training_time and (time.time() - start_time) > max_training_time:
- print("+++++++++++++ TRAINING STOPPED: TIMING CONSTRAINTS +++++++++++++")
- break
-
- # At every iteration we keep linearly decreasing the value of alpha
- alpha = (max_frames - total_frames) / max_frames
- epsilon = clipping_parameter * alpha
- optimizer = Adam(learning_rate = adam_stepsize * alpha)
-
- # Run the policy for T timesteps
- states, actions, action_distributions, critic_values, rewards, dones = play_for_timestamps(env, state, model)
-
- # We want to keep track of how many episodes have ended and their reward
- # We shouldn't be in the case where multiple episodes have ended within
- # the same horizon, as it's against what was suggested in the paper.
- # However, we may be very unlucky and perform that poorly.
- latest_termination_index = 0
- termination_indexes = np.nonzero(dones)[0]
- for termination_index in termination_indexes:
-
- current_episode_reward += np.sum(rewards[latest_termination_index : termination_index + 1])
- recent_rewards[episodes_solved % history_length] = current_episode_reward
- mean_reward = np.mean(recent_rewards)
- episodes_solved += 1
-
- if stop_when_reward_above_threshold and mean_reward >= reward_threshold:
- print(f"Solved after {episodes_solved} episodes. The last episode ended with a reward of {current_episode_reward}, the average reward was {mean_reward:.2f}")
- target_reached = True
- break
- else:
- print(f"Episode {episodes_solved} ended with a reward of {current_episode_reward}, the average reward is {mean_reward:.2f}")
-
- if episodes_solved % 100 == 0:
- print(f"++++++ We have been running for {total_frames}/{max_frames} frames ++++++")
-
- current_episode_reward = 0
- latest_termination_index = termination_index
-
- # Make sure we keep track of the reward even if there is no end state
- # in this batch.
- if len(termination_indexes) == 0:
- current_episode_reward += np.sum(rewards)
-
- perform_ppo_update(model, states, actions, action_distributions, critic_values, rewards, dones)
- state = states[-1]
- total_frames += horizon
-
-except KeyboardInterrupt:
- print("---------------- TRAINING INTERRUPTED MANUALLY ----------------")
-
Episode 1 ended with a reward of 24.0, the average reward is 0.24 -Episode 2 ended with a reward of 20.0, the average reward is 0.44 -Episode 3 ended with a reward of 27.0, the average reward is 0.71 -Episode 4 ended with a reward of 23.0, the average reward is 0.94 -Episode 5 ended with a reward of 18.0, the average reward is 1.12 -Episode 6 ended with a reward of 24.0, the average reward is 1.36 -Episode 7 ended with a reward of 27.0, the average reward is 1.63 -Episode 8 ended with a reward of 24.0, the average reward is 1.87 -Episode 9 ended with a reward of 31.0, the average reward is 2.18 -Episode 10 ended with a reward of 30.0, the average reward is 2.48 -Episode 11 ended with a reward of 31.0, the average reward is 2.79 -Episode 12 ended with a reward of 21.0, the average reward is 3.00 -Episode 13 ended with a reward of 22.0, the average reward is 3.22 -Episode 14 ended with a reward of 28.0, the average reward is 3.50 -Episode 15 ended with a reward of 27.0, the average reward is 3.77 -Episode 16 ended with a reward of 30.0, the average reward is 4.07 -Episode 17 ended with a reward of 25.0, the average reward is 4.32 -Episode 18 ended with a reward of 28.0, the average reward is 4.60 -Episode 19 ended with a reward of 24.0, the average reward is 4.84 -Episode 20 ended with a reward of 41.0, the average reward is 5.25 -Episode 21 ended with a reward of 24.0, the average reward is 5.49 -Episode 22 ended with a reward of 18.0, the average reward is 5.67 -Episode 23 ended with a reward of 24.0, the average reward is 5.91 -Episode 24 ended with a reward of 23.0, the average reward is 6.14 -Episode 25 ended with a reward of 21.0, the average reward is 6.35 -Episode 26 ended with a reward of 25.0, the average reward is 6.60 -Episode 27 ended with a reward of 22.0, the average reward is 6.82 -Episode 28 ended with a reward of 39.0, the average reward is 7.21 -Episode 29 ended with a reward of 30.0, the average reward is 7.51 -Episode 30 ended with a reward of 33.0, the average reward is 7.84 -Episode 31 ended with a reward of 31.0, the average reward is 8.15 -Episode 32 ended with a reward of 12.0, the average reward is 8.27 -Episode 33 ended with a reward of 21.0, the average reward is 8.48 -Episode 34 ended with a reward of 27.0, the average reward is 8.75 -Episode 35 ended with a reward of 25.0, the average reward is 9.00 -Episode 36 ended with a reward of 29.0, the average reward is 9.29 -Episode 37 ended with a reward of 30.0, the average reward is 9.59 -Episode 38 ended with a reward of 20.0, the average reward is 9.79 -Episode 39 ended with a reward of 29.0, the average reward is 10.08 -Episode 40 ended with a reward of 15.0, the average reward is 10.23 -Episode 41 ended with a reward of 20.0, the average reward is 10.43 -Episode 42 ended with a reward of 27.0, the average reward is 10.70 -Episode 43 ended with a reward of 30.0, the average reward is 11.00 -Episode 44 ended with a reward of 28.0, the average reward is 11.28 -Episode 45 ended with a reward of 24.0, the average reward is 11.52 -Episode 46 ended with a reward of 12.0, the average reward is 11.64 -Episode 47 ended with a reward of 26.0, the average reward is 11.90 -Episode 48 ended with a reward of 30.0, the average reward is 12.20 -Episode 49 ended with a reward of 21.0, the average reward is 12.41 -Episode 50 ended with a reward of 22.0, the average reward is 12.63 -Episode 51 ended with a reward of 36.0, the average reward is 12.99 -Episode 52 ended with a reward of 22.0, the average reward is 13.21 -Episode 53 ended with a reward of 33.0, the average reward is 13.54 -Episode 54 ended with a reward of 27.0, the average reward is 13.81 -Episode 55 ended with a reward of 30.0, the average reward is 14.11 -Episode 56 ended with a reward of 24.0, the average reward is 14.35 -Episode 57 ended with a reward of 29.0, the average reward is 14.64 -Episode 58 ended with a reward of 30.0, the average reward is 14.94 -Episode 59 ended with a reward of 18.0, the average reward is 15.12 -Episode 60 ended with a reward of 22.0, the average reward is 15.34 -Episode 61 ended with a reward of 24.0, the average reward is 15.58 -Episode 62 ended with a reward of 13.0, the average reward is 15.71 -Episode 63 ended with a reward of 26.0, the average reward is 15.97 -Episode 64 ended with a reward of 27.0, the average reward is 16.24 -Episode 65 ended with a reward of 25.0, the average reward is 16.49 -Episode 66 ended with a reward of 32.0, the average reward is 16.81 -Episode 67 ended with a reward of 33.0, the average reward is 17.14 -Episode 68 ended with a reward of 30.0, the average reward is 17.44 -Episode 69 ended with a reward of 12.0, the average reward is 17.56 -Episode 70 ended with a reward of 24.0, the average reward is 17.80 -Episode 71 ended with a reward of 18.0, the average reward is 17.98 -Episode 72 ended with a reward of 19.0, the average reward is 18.17 -Episode 73 ended with a reward of 16.0, the average reward is 18.33 -Episode 74 ended with a reward of 15.0, the average reward is 18.48 -Episode 75 ended with a reward of 18.0, the average reward is 18.66 -Episode 76 ended with a reward of 22.0, the average reward is 18.88 -Episode 77 ended with a reward of 20.0, the average reward is 19.08 -Episode 78 ended with a reward of 27.0, the average reward is 19.35 -Episode 79 ended with a reward of 13.0, the average reward is 19.48 -Episode 80 ended with a reward of 26.0, the average reward is 19.74 -Episode 81 ended with a reward of 19.0, the average reward is 19.93 -Episode 82 ended with a reward of 27.0, the average reward is 20.20 -Episode 83 ended with a reward of 29.0, the average reward is 20.49 -Episode 84 ended with a reward of 30.0, the average reward is 20.79 -Episode 85 ended with a reward of 21.0, the average reward is 21.00 -Episode 86 ended with a reward of 18.0, the average reward is 21.18 -Episode 87 ended with a reward of 22.0, the average reward is 21.40 -Episode 88 ended with a reward of 19.0, the average reward is 21.59 -Episode 89 ended with a reward of 25.0, the average reward is 21.84 -Episode 90 ended with a reward of 25.0, the average reward is 22.09 -Episode 91 ended with a reward of 17.0, the average reward is 22.26 -Episode 92 ended with a reward of 27.0, the average reward is 22.53 -Episode 93 ended with a reward of 27.0, the average reward is 22.80 -Episode 94 ended with a reward of 28.0, the average reward is 23.08 -Episode 95 ended with a reward of 30.0, the average reward is 23.38 -Episode 96 ended with a reward of 19.0, the average reward is 23.57 -Episode 97 ended with a reward of 18.0, the average reward is 23.75 -Episode 98 ended with a reward of 27.0, the average reward is 24.02 -Episode 99 ended with a reward of 27.0, the average reward is 24.29 -Episode 100 ended with a reward of 30.0, the average reward is 24.59 -++++++ We have been running for 280192/40000000.0 frames ++++++ -Episode 101 ended with a reward of 8.0, the average reward is 24.43 -Episode 102 ended with a reward of 15.0, the average reward is 24.38 -Episode 103 ended with a reward of 22.0, the average reward is 24.33 -Episode 104 ended with a reward of 27.0, the average reward is 24.37 -Episode 105 ended with a reward of 21.0, the average reward is 24.40 -Episode 106 ended with a reward of 28.0, the average reward is 24.44 -Episode 107 ended with a reward of 13.0, the average reward is 24.30 -Episode 108 ended with a reward of 22.0, the average reward is 24.28 -Episode 109 ended with a reward of 18.0, the average reward is 24.15 -Episode 110 ended with a reward of 17.0, the average reward is 24.02 -Episode 111 ended with a reward of 30.0, the average reward is 24.01 -Episode 112 ended with a reward of 27.0, the average reward is 24.07 -Episode 113 ended with a reward of 22.0, the average reward is 24.07 -Episode 114 ended with a reward of 28.0, the average reward is 24.07 -Episode 115 ended with a reward of 22.0, the average reward is 24.02 -Episode 116 ended with a reward of 19.0, the average reward is 23.91 -Episode 117 ended with a reward of 27.0, the average reward is 23.93 -Episode 118 ended with a reward of 24.0, the average reward is 23.89 -Episode 119 ended with a reward of 18.0, the average reward is 23.83 -Episode 120 ended with a reward of 26.0, the average reward is 23.68 -Episode 121 ended with a reward of 31.0, the average reward is 23.75 -Episode 122 ended with a reward of 21.0, the average reward is 23.78 -Episode 123 ended with a reward of 30.0, the average reward is 23.84 -Episode 124 ended with a reward of 33.0, the average reward is 23.94 -Episode 125 ended with a reward of 18.0, the average reward is 23.91 -Episode 126 ended with a reward of 19.0, the average reward is 23.85 -Episode 127 ended with a reward of 25.0, the average reward is 23.88 -Episode 128 ended with a reward of 24.0, the average reward is 23.73 -Episode 129 ended with a reward of 19.0, the average reward is 23.62 -Episode 130 ended with a reward of 20.0, the average reward is 23.49 -Episode 131 ended with a reward of 21.0, the average reward is 23.39 -Episode 132 ended with a reward of 14.0, the average reward is 23.41 -Episode 133 ended with a reward of 27.0, the average reward is 23.47 -Episode 134 ended with a reward of 14.0, the average reward is 23.34 -Episode 135 ended with a reward of 19.0, the average reward is 23.28 -Episode 136 ended with a reward of 24.0, the average reward is 23.23 -Episode 137 ended with a reward of 25.0, the average reward is 23.18 -Episode 138 ended with a reward of 21.0, the average reward is 23.19 -Episode 139 ended with a reward of 30.0, the average reward is 23.20 -Episode 140 ended with a reward of 18.0, the average reward is 23.23 -Episode 141 ended with a reward of 24.0, the average reward is 23.27 -Episode 142 ended with a reward of 29.0, the average reward is 23.29 -Episode 143 ended with a reward of 21.0, the average reward is 23.20 -Episode 144 ended with a reward of 22.0, the average reward is 23.14 -Episode 145 ended with a reward of 38.0, the average reward is 23.28 -Episode 146 ended with a reward of 18.0, the average reward is 23.34 -Episode 147 ended with a reward of 24.0, the average reward is 23.32 -Episode 148 ended with a reward of 21.0, the average reward is 23.23 -Episode 149 ended with a reward of 34.0, the average reward is 23.36 -Episode 150 ended with a reward of 24.0, the average reward is 23.38 -Episode 151 ended with a reward of 15.0, the average reward is 23.17 -Episode 152 ended with a reward of 30.0, the average reward is 23.25 -Episode 153 ended with a reward of 25.0, the average reward is 23.17 -Episode 154 ended with a reward of 21.0, the average reward is 23.11 -Episode 155 ended with a reward of 25.0, the average reward is 23.06 -Episode 156 ended with a reward of 24.0, the average reward is 23.06 -Episode 157 ended with a reward of 24.0, the average reward is 23.01 -Episode 158 ended with a reward of 31.0, the average reward is 23.02 -Episode 159 ended with a reward of 19.0, the average reward is 23.03 -Episode 160 ended with a reward of 30.0, the average reward is 23.11 -Episode 161 ended with a reward of 28.0, the average reward is 23.15 -Episode 162 ended with a reward of 12.0, the average reward is 23.14 -Episode 163 ended with a reward of 15.0, the average reward is 23.03 -Episode 164 ended with a reward of 39.0, the average reward is 23.15 -Episode 165 ended with a reward of 24.0, the average reward is 23.14 -Episode 166 ended with a reward of 28.0, the average reward is 23.10 -Episode 167 ended with a reward of 25.0, the average reward is 23.02 -Episode 168 ended with a reward of 33.0, the average reward is 23.05 -Episode 169 ended with a reward of 24.0, the average reward is 23.17 -Episode 170 ended with a reward of 33.0, the average reward is 23.26 -Episode 171 ended with a reward of 24.0, the average reward is 23.32 -Episode 172 ended with a reward of 30.0, the average reward is 23.43 -Episode 173 ended with a reward of 30.0, the average reward is 23.57 -Episode 174 ended with a reward of 24.0, the average reward is 23.66 -Episode 175 ended with a reward of 37.0, the average reward is 23.85 -Episode 176 ended with a reward of 16.0, the average reward is 23.79 -Episode 177 ended with a reward of 33.0, the average reward is 23.92 -Episode 178 ended with a reward of 24.0, the average reward is 23.89 -Episode 179 ended with a reward of 27.0, the average reward is 24.03 -Episode 180 ended with a reward of 31.0, the average reward is 24.08 -Episode 181 ended with a reward of 9.0, the average reward is 23.98 -Episode 182 ended with a reward of 27.0, the average reward is 23.98 -Episode 183 ended with a reward of 19.0, the average reward is 23.88 -Episode 184 ended with a reward of 31.0, the average reward is 23.89 -Episode 185 ended with a reward of 20.0, the average reward is 23.88 -Episode 186 ended with a reward of 25.0, the average reward is 23.95 -Episode 187 ended with a reward of 21.0, the average reward is 23.94 -Episode 188 ended with a reward of 28.0, the average reward is 24.03 -Episode 189 ended with a reward of 30.0, the average reward is 24.08 -Episode 190 ended with a reward of 21.0, the average reward is 24.04 -Episode 191 ended with a reward of 35.0, the average reward is 24.22 -Episode 192 ended with a reward of 22.0, the average reward is 24.17 -Episode 193 ended with a reward of 21.0, the average reward is 24.11 -Episode 194 ended with a reward of 24.0, the average reward is 24.07 -Episode 195 ended with a reward of 22.0, the average reward is 23.99 -Episode 196 ended with a reward of 34.0, the average reward is 24.14 -Episode 197 ended with a reward of 27.0, the average reward is 24.23 -Episode 198 ended with a reward of 25.0, the average reward is 24.21 -Episode 199 ended with a reward of 39.0, the average reward is 24.33 -Episode 200 ended with a reward of 24.0, the average reward is 24.27 -++++++ We have been running for 560896/40000000.0 frames ++++++ -Episode 201 ended with a reward of 26.0, the average reward is 24.45 -Episode 202 ended with a reward of 25.0, the average reward is 24.55 -Episode 203 ended with a reward of 27.0, the average reward is 24.60 -Episode 204 ended with a reward of 45.0, the average reward is 24.78 -Episode 205 ended with a reward of 39.0, the average reward is 24.96 -Episode 206 ended with a reward of 29.0, the average reward is 24.97 -Episode 207 ended with a reward of 18.0, the average reward is 25.02 -Episode 208 ended with a reward of 11.0, the average reward is 24.91 -Episode 209 ended with a reward of 26.0, the average reward is 24.99 -Episode 210 ended with a reward of 22.0, the average reward is 25.04 -Episode 211 ended with a reward of 19.0, the average reward is 24.93 -Episode 212 ended with a reward of 24.0, the average reward is 24.90 -Episode 213 ended with a reward of 36.0, the average reward is 25.04 -Episode 214 ended with a reward of 25.0, the average reward is 25.01 -Episode 215 ended with a reward of 15.0, the average reward is 24.94 -Episode 216 ended with a reward of 25.0, the average reward is 25.00 -Episode 217 ended with a reward of 26.0, the average reward is 24.99 -Episode 218 ended with a reward of 30.0, the average reward is 25.05 -Episode 219 ended with a reward of 34.0, the average reward is 25.21 -Episode 220 ended with a reward of 32.0, the average reward is 25.27 -Episode 221 ended with a reward of 27.0, the average reward is 25.23 -Episode 222 ended with a reward of 27.0, the average reward is 25.29 -Episode 223 ended with a reward of 24.0, the average reward is 25.23 -Episode 224 ended with a reward of 24.0, the average reward is 25.14 -Episode 225 ended with a reward of 24.0, the average reward is 25.20 -Episode 226 ended with a reward of 25.0, the average reward is 25.26 -Episode 227 ended with a reward of 24.0, the average reward is 25.25 -Episode 228 ended with a reward of 17.0, the average reward is 25.18 -Episode 229 ended with a reward of 33.0, the average reward is 25.32 -Episode 230 ended with a reward of 22.0, the average reward is 25.34 -Episode 231 ended with a reward of 40.0, the average reward is 25.53 -Episode 232 ended with a reward of 30.0, the average reward is 25.69 -Episode 233 ended with a reward of 19.0, the average reward is 25.61 -Episode 234 ended with a reward of 16.0, the average reward is 25.63 -Episode 235 ended with a reward of 33.0, the average reward is 25.77 -Episode 236 ended with a reward of 21.0, the average reward is 25.74 -Episode 237 ended with a reward of 30.0, the average reward is 25.79 -Episode 238 ended with a reward of 17.0, the average reward is 25.75 -Episode 239 ended with a reward of 35.0, the average reward is 25.80 -Episode 240 ended with a reward of 27.0, the average reward is 25.89 -Episode 241 ended with a reward of 37.0, the average reward is 26.02 -Episode 242 ended with a reward of 32.0, the average reward is 26.05 -Episode 243 ended with a reward of 37.0, the average reward is 26.21 -Episode 244 ended with a reward of 31.0, the average reward is 26.30 -Episode 245 ended with a reward of 34.0, the average reward is 26.26 -Episode 246 ended with a reward of 28.0, the average reward is 26.36 -Episode 247 ended with a reward of 36.0, the average reward is 26.48 -Episode 248 ended with a reward of 31.0, the average reward is 26.58 -Episode 249 ended with a reward of 27.0, the average reward is 26.51 -Episode 250 ended with a reward of 36.0, the average reward is 26.63 -Episode 251 ended with a reward of 30.0, the average reward is 26.78 -Episode 252 ended with a reward of 36.0, the average reward is 26.84 -Episode 253 ended with a reward of 28.0, the average reward is 26.87 -Episode 254 ended with a reward of 35.0, the average reward is 27.01 -Episode 255 ended with a reward of 31.0, the average reward is 27.07 -Episode 256 ended with a reward of 28.0, the average reward is 27.11 -Episode 257 ended with a reward of 34.0, the average reward is 27.21 -Episode 258 ended with a reward of 34.0, the average reward is 27.24 -Episode 259 ended with a reward of 32.0, the average reward is 27.37 -Episode 260 ended with a reward of 30.0, the average reward is 27.37 -Episode 261 ended with a reward of 21.0, the average reward is 27.30 -Episode 262 ended with a reward of 28.0, the average reward is 27.46 -Episode 263 ended with a reward of 19.0, the average reward is 27.50 -Episode 264 ended with a reward of 39.0, the average reward is 27.50 -Episode 265 ended with a reward of 32.0, the average reward is 27.58 -Episode 266 ended with a reward of 30.0, the average reward is 27.60 -Episode 267 ended with a reward of 36.0, the average reward is 27.71 -Episode 268 ended with a reward of 34.0, the average reward is 27.72 -Episode 269 ended with a reward of 23.0, the average reward is 27.71 -Episode 270 ended with a reward of 27.0, the average reward is 27.65 -Episode 271 ended with a reward of 34.0, the average reward is 27.75 -Episode 272 ended with a reward of 30.0, the average reward is 27.75 -Episode 273 ended with a reward of 32.0, the average reward is 27.77 -Episode 274 ended with a reward of 22.0, the average reward is 27.75 -Episode 275 ended with a reward of 20.0, the average reward is 27.58 -Episode 276 ended with a reward of 31.0, the average reward is 27.73 -Episode 277 ended with a reward of 26.0, the average reward is 27.66 -Episode 278 ended with a reward of 30.0, the average reward is 27.72 -Episode 279 ended with a reward of 32.0, the average reward is 27.77 -Episode 280 ended with a reward of 37.0, the average reward is 27.83 -Episode 281 ended with a reward of 22.0, the average reward is 27.96 -Episode 282 ended with a reward of 33.0, the average reward is 28.02 -Episode 283 ended with a reward of 25.0, the average reward is 28.08 -Episode 284 ended with a reward of 24.0, the average reward is 28.01 -Episode 285 ended with a reward of 25.0, the average reward is 28.06 -Episode 286 ended with a reward of 19.0, the average reward is 28.00 -Episode 287 ended with a reward of 19.0, the average reward is 27.98 -Episode 288 ended with a reward of 22.0, the average reward is 27.92 -Episode 289 ended with a reward of 16.0, the average reward is 27.78 -Episode 290 ended with a reward of 23.0, the average reward is 27.80 -Episode 291 ended with a reward of 16.0, the average reward is 27.61 -Episode 292 ended with a reward of 27.0, the average reward is 27.66 -Episode 293 ended with a reward of 28.0, the average reward is 27.73 -Episode 294 ended with a reward of 35.0, the average reward is 27.84 -Episode 295 ended with a reward of 31.0, the average reward is 27.93 -Episode 296 ended with a reward of 27.0, the average reward is 27.86 -Episode 297 ended with a reward of 33.0, the average reward is 27.92 -Episode 298 ended with a reward of 33.0, the average reward is 28.00 -Episode 299 ended with a reward of 27.0, the average reward is 27.88 -Episode 300 ended with a reward of 27.0, the average reward is 27.91 -++++++ We have been running for 842880/40000000.0 frames ++++++ -Episode 301 ended with a reward of 21.0, the average reward is 27.86 -Episode 302 ended with a reward of 27.0, the average reward is 27.88 -Episode 303 ended with a reward of 28.0, the average reward is 27.89 -Episode 304 ended with a reward of 32.0, the average reward is 27.76 -Episode 305 ended with a reward of 26.0, the average reward is 27.63 -Episode 306 ended with a reward of 33.0, the average reward is 27.67 -Episode 307 ended with a reward of 27.0, the average reward is 27.76 -Episode 308 ended with a reward of 25.0, the average reward is 27.90 -Episode 309 ended with a reward of 36.0, the average reward is 28.00 -Episode 310 ended with a reward of 21.0, the average reward is 27.99 -Episode 311 ended with a reward of 30.0, the average reward is 28.10 -Episode 312 ended with a reward of 36.0, the average reward is 28.22 -Episode 313 ended with a reward of 28.0, the average reward is 28.14 -Episode 314 ended with a reward of 19.0, the average reward is 28.08 -Episode 315 ended with a reward of 27.0, the average reward is 28.20 -Episode 316 ended with a reward of 26.0, the average reward is 28.21 -Episode 317 ended with a reward of 32.0, the average reward is 28.27 -Episode 318 ended with a reward of 25.0, the average reward is 28.22 -Episode 319 ended with a reward of 22.0, the average reward is 28.10 -Episode 320 ended with a reward of 22.0, the average reward is 28.00 -Episode 321 ended with a reward of 27.0, the average reward is 28.00 -Episode 322 ended with a reward of 25.0, the average reward is 27.98 -Episode 323 ended with a reward of 22.0, the average reward is 27.96 -Episode 324 ended with a reward of 28.0, the average reward is 28.00 -Episode 325 ended with a reward of 32.0, the average reward is 28.08 -Episode 326 ended with a reward of 26.0, the average reward is 28.09 -Episode 327 ended with a reward of 27.0, the average reward is 28.12 -Episode 328 ended with a reward of 25.0, the average reward is 28.20 -Episode 329 ended with a reward of 30.0, the average reward is 28.17 -Episode 330 ended with a reward of 30.0, the average reward is 28.25 -Episode 331 ended with a reward of 27.0, the average reward is 28.12 -Episode 332 ended with a reward of 30.0, the average reward is 28.12 -Episode 333 ended with a reward of 24.0, the average reward is 28.17 -Episode 334 ended with a reward of 27.0, the average reward is 28.28 -Episode 335 ended with a reward of 39.0, the average reward is 28.34 -Episode 336 ended with a reward of 24.0, the average reward is 28.37 -Episode 337 ended with a reward of 19.0, the average reward is 28.26 -Episode 338 ended with a reward of 27.0, the average reward is 28.36 -Episode 339 ended with a reward of 25.0, the average reward is 28.26 -Episode 340 ended with a reward of 31.0, the average reward is 28.30 -Episode 341 ended with a reward of 26.0, the average reward is 28.19 -Episode 342 ended with a reward of 33.0, the average reward is 28.20 -Episode 343 ended with a reward of 30.0, the average reward is 28.13 -Episode 344 ended with a reward of 36.0, the average reward is 28.18 -Episode 345 ended with a reward of 24.0, the average reward is 28.08 -Episode 346 ended with a reward of 33.0, the average reward is 28.13 -Episode 347 ended with a reward of 25.0, the average reward is 28.02 -Episode 348 ended with a reward of 30.0, the average reward is 28.01 -Episode 349 ended with a reward of 30.0, the average reward is 28.04 -Episode 350 ended with a reward of 26.0, the average reward is 27.94 -Episode 351 ended with a reward of 37.0, the average reward is 28.01 -Episode 352 ended with a reward of 36.0, the average reward is 28.01 -Episode 353 ended with a reward of 32.0, the average reward is 28.05 -Episode 354 ended with a reward of 34.0, the average reward is 28.04 -Episode 355 ended with a reward of 33.0, the average reward is 28.06 -Episode 356 ended with a reward of 27.0, the average reward is 28.05 -Episode 357 ended with a reward of 37.0, the average reward is 28.08 -Episode 358 ended with a reward of 33.0, the average reward is 28.07 -Episode 359 ended with a reward of 25.0, the average reward is 28.00 -Episode 360 ended with a reward of 27.0, the average reward is 27.97 -Episode 361 ended with a reward of 38.0, the average reward is 28.14 -Episode 362 ended with a reward of 29.0, the average reward is 28.15 -Episode 363 ended with a reward of 21.0, the average reward is 28.17 -Episode 364 ended with a reward of 28.0, the average reward is 28.06 -Episode 365 ended with a reward of 27.0, the average reward is 28.01 -Episode 366 ended with a reward of 31.0, the average reward is 28.02 -Episode 367 ended with a reward of 24.0, the average reward is 27.90 -Episode 368 ended with a reward of 19.0, the average reward is 27.75 -Episode 369 ended with a reward of 28.0, the average reward is 27.80 -Episode 370 ended with a reward of 31.0, the average reward is 27.84 -Episode 371 ended with a reward of 29.0, the average reward is 27.79 -Episode 372 ended with a reward of 13.0, the average reward is 27.62 -Episode 373 ended with a reward of 33.0, the average reward is 27.63 -Episode 374 ended with a reward of 30.0, the average reward is 27.71 -Episode 375 ended with a reward of 40.0, the average reward is 27.91 -Episode 376 ended with a reward of 33.0, the average reward is 27.93 -Episode 377 ended with a reward of 28.0, the average reward is 27.95 -Episode 378 ended with a reward of 27.0, the average reward is 27.92 -Episode 379 ended with a reward of 42.0, the average reward is 28.02 -Episode 380 ended with a reward of 22.0, the average reward is 27.87 -Episode 381 ended with a reward of 24.0, the average reward is 27.89 -Episode 382 ended with a reward of 27.0, the average reward is 27.83 -Episode 383 ended with a reward of 30.0, the average reward is 27.88 -Episode 384 ended with a reward of 33.0, the average reward is 27.97 -Episode 385 ended with a reward of 34.0, the average reward is 28.06 -Episode 386 ended with a reward of 24.0, the average reward is 28.11 -Episode 387 ended with a reward of 33.0, the average reward is 28.25 -Episode 388 ended with a reward of 40.0, the average reward is 28.43 -Episode 389 ended with a reward of 30.0, the average reward is 28.57 -Episode 390 ended with a reward of 24.0, the average reward is 28.58 -Episode 391 ended with a reward of 34.0, the average reward is 28.76 -Episode 392 ended with a reward of 31.0, the average reward is 28.80 -Episode 393 ended with a reward of 31.0, the average reward is 28.83 -Episode 394 ended with a reward of 23.0, the average reward is 28.71 -Episode 395 ended with a reward of 39.0, the average reward is 28.79 -Episode 396 ended with a reward of 32.0, the average reward is 28.84 -Episode 397 ended with a reward of 30.0, the average reward is 28.81 -Episode 398 ended with a reward of 28.0, the average reward is 28.76 -Episode 399 ended with a reward of 40.0, the average reward is 28.89 -Episode 400 ended with a reward of 36.0, the average reward is 28.98 -++++++ We have been running for 1125632/40000000.0 frames ++++++ -Episode 401 ended with a reward of 33.0, the average reward is 29.10 -Episode 402 ended with a reward of 26.0, the average reward is 29.09 -Episode 403 ended with a reward of 31.0, the average reward is 29.12 -Episode 404 ended with a reward of 19.0, the average reward is 28.99 -Episode 405 ended with a reward of 25.0, the average reward is 28.98 -Episode 406 ended with a reward of 36.0, the average reward is 29.01 -Episode 407 ended with a reward of 40.0, the average reward is 29.14 -Episode 408 ended with a reward of 36.0, the average reward is 29.25 -Episode 409 ended with a reward of 32.0, the average reward is 29.21 -Episode 410 ended with a reward of 27.0, the average reward is 29.27 -Episode 411 ended with a reward of 21.0, the average reward is 29.18 -Episode 412 ended with a reward of 39.0, the average reward is 29.21 -Episode 413 ended with a reward of 34.0, the average reward is 29.27 -Episode 414 ended with a reward of 22.0, the average reward is 29.30 -Episode 415 ended with a reward of 38.0, the average reward is 29.41 -Episode 416 ended with a reward of 35.0, the average reward is 29.50 -Episode 417 ended with a reward of 31.0, the average reward is 29.49 -Episode 418 ended with a reward of 36.0, the average reward is 29.60 -Episode 419 ended with a reward of 31.0, the average reward is 29.69 -Episode 420 ended with a reward of 25.0, the average reward is 29.72 -Episode 421 ended with a reward of 36.0, the average reward is 29.81 -Episode 422 ended with a reward of 35.0, the average reward is 29.91 -Episode 423 ended with a reward of 30.0, the average reward is 29.99 -Episode 424 ended with a reward of 39.0, the average reward is 30.10 -Episode 425 ended with a reward of 30.0, the average reward is 30.08 -Episode 426 ended with a reward of 30.0, the average reward is 30.12 -Episode 427 ended with a reward of 38.0, the average reward is 30.23 -Episode 428 ended with a reward of 25.0, the average reward is 30.23 -Episode 429 ended with a reward of 32.0, the average reward is 30.25 -Episode 430 ended with a reward of 30.0, the average reward is 30.25 -Episode 431 ended with a reward of 31.0, the average reward is 30.29 -Episode 432 ended with a reward of 50.0, the average reward is 30.49 -Episode 433 ended with a reward of 33.0, the average reward is 30.58 -Episode 434 ended with a reward of 27.0, the average reward is 30.58 -Episode 435 ended with a reward of 27.0, the average reward is 30.46 -Episode 436 ended with a reward of 22.0, the average reward is 30.44 -Episode 437 ended with a reward of 33.0, the average reward is 30.58 -Episode 438 ended with a reward of 33.0, the average reward is 30.64 -Episode 439 ended with a reward of 36.0, the average reward is 30.75 -Episode 440 ended with a reward of 23.0, the average reward is 30.67 -Episode 441 ended with a reward of 28.0, the average reward is 30.69 -Episode 442 ended with a reward of 28.0, the average reward is 30.64 -Episode 443 ended with a reward of 33.0, the average reward is 30.67 -Episode 444 ended with a reward of 26.0, the average reward is 30.57 -Episode 445 ended with a reward of 34.0, the average reward is 30.67 -Episode 446 ended with a reward of 32.0, the average reward is 30.66 -Episode 447 ended with a reward of 27.0, the average reward is 30.68 -Episode 448 ended with a reward of 38.0, the average reward is 30.76 -Episode 449 ended with a reward of 30.0, the average reward is 30.76 -Episode 450 ended with a reward of 30.0, the average reward is 30.80 -Episode 451 ended with a reward of 25.0, the average reward is 30.68 -Episode 452 ended with a reward of 25.0, the average reward is 30.57 -Episode 453 ended with a reward of 30.0, the average reward is 30.55 -Episode 454 ended with a reward of 39.0, the average reward is 30.60 -Episode 455 ended with a reward of 32.0, the average reward is 30.59 -Episode 456 ended with a reward of 26.0, the average reward is 30.58 -Episode 457 ended with a reward of 33.0, the average reward is 30.54 -Episode 458 ended with a reward of 57.0, the average reward is 30.78 -Episode 459 ended with a reward of 43.0, the average reward is 30.96 -Episode 460 ended with a reward of 30.0, the average reward is 30.99 -Episode 461 ended with a reward of 28.0, the average reward is 30.89 -Episode 462 ended with a reward of 37.0, the average reward is 30.97 -Episode 463 ended with a reward of 24.0, the average reward is 31.00 -Episode 464 ended with a reward of 33.0, the average reward is 31.05 -Episode 465 ended with a reward of 34.0, the average reward is 31.12 -Episode 466 ended with a reward of 47.0, the average reward is 31.28 -Episode 467 ended with a reward of 36.0, the average reward is 31.40 -Episode 468 ended with a reward of 43.0, the average reward is 31.64 -Episode 469 ended with a reward of 43.0, the average reward is 31.79 -Episode 470 ended with a reward of 25.0, the average reward is 31.73 -Episode 471 ended with a reward of 34.0, the average reward is 31.78 -Episode 472 ended with a reward of 43.0, the average reward is 32.08 -Episode 473 ended with a reward of 18.0, the average reward is 31.93 -Episode 474 ended with a reward of 31.0, the average reward is 31.94 -Episode 475 ended with a reward of 24.0, the average reward is 31.78 -Episode 476 ended with a reward of 51.0, the average reward is 31.96 -Episode 477 ended with a reward of 41.0, the average reward is 32.09 -Episode 478 ended with a reward of 49.0, the average reward is 32.31 -Episode 479 ended with a reward of 33.0, the average reward is 32.22 -Episode 480 ended with a reward of 28.0, the average reward is 32.28 -Episode 481 ended with a reward of 36.0, the average reward is 32.40 -Episode 482 ended with a reward of 37.0, the average reward is 32.50 -Episode 483 ended with a reward of 36.0, the average reward is 32.56 -Episode 484 ended with a reward of 28.0, the average reward is 32.51 -Episode 485 ended with a reward of 40.0, the average reward is 32.57 -Episode 486 ended with a reward of 38.0, the average reward is 32.71 -Episode 487 ended with a reward of 33.0, the average reward is 32.71 -Episode 488 ended with a reward of 47.0, the average reward is 32.78 -Episode 489 ended with a reward of 43.0, the average reward is 32.91 -Episode 490 ended with a reward of 45.0, the average reward is 33.12 -Episode 491 ended with a reward of 38.0, the average reward is 33.16 -Episode 492 ended with a reward of 27.0, the average reward is 33.12 -Episode 493 ended with a reward of 26.0, the average reward is 33.07 -Episode 494 ended with a reward of 33.0, the average reward is 33.17 -Episode 495 ended with a reward of 38.0, the average reward is 33.16 -Episode 496 ended with a reward of 23.0, the average reward is 33.07 -Episode 497 ended with a reward of 33.0, the average reward is 33.10 -Episode 498 ended with a reward of 25.0, the average reward is 33.07 -Episode 499 ended with a reward of 37.0, the average reward is 33.04 -Episode 500 ended with a reward of 27.0, the average reward is 32.95 -++++++ We have been running for 1408128/40000000.0 frames ++++++ -Episode 501 ended with a reward of 24.0, the average reward is 32.86 -Episode 502 ended with a reward of 35.0, the average reward is 32.95 -Episode 503 ended with a reward of 37.0, the average reward is 33.01 -Episode 504 ended with a reward of 29.0, the average reward is 33.11 -Episode 505 ended with a reward of 42.0, the average reward is 33.28 -Episode 506 ended with a reward of 37.0, the average reward is 33.29 -Episode 507 ended with a reward of 39.0, the average reward is 33.28 -Episode 508 ended with a reward of 25.0, the average reward is 33.17 -Episode 509 ended with a reward of 30.0, the average reward is 33.15 -Episode 510 ended with a reward of 34.0, the average reward is 33.22 -Episode 511 ended with a reward of 33.0, the average reward is 33.34 -Episode 512 ended with a reward of 31.0, the average reward is 33.26 -Episode 513 ended with a reward of 31.0, the average reward is 33.23 -Episode 514 ended with a reward of 30.0, the average reward is 33.31 -Episode 515 ended with a reward of 41.0, the average reward is 33.34 -Episode 516 ended with a reward of 34.0, the average reward is 33.33 -Episode 517 ended with a reward of 27.0, the average reward is 33.29 -Episode 518 ended with a reward of 24.0, the average reward is 33.17 -Episode 519 ended with a reward of 24.0, the average reward is 33.10 -Episode 520 ended with a reward of 25.0, the average reward is 33.10 -Episode 521 ended with a reward of 57.0, the average reward is 33.31 -Episode 522 ended with a reward of 37.0, the average reward is 33.33 -Episode 523 ended with a reward of 38.0, the average reward is 33.41 -Episode 524 ended with a reward of 30.0, the average reward is 33.32 -Episode 525 ended with a reward of 45.0, the average reward is 33.47 -Episode 526 ended with a reward of 37.0, the average reward is 33.54 -Episode 527 ended with a reward of 38.0, the average reward is 33.54 -Episode 528 ended with a reward of 27.0, the average reward is 33.56 -Episode 529 ended with a reward of 38.0, the average reward is 33.62 -Episode 530 ended with a reward of 27.0, the average reward is 33.59 -Episode 531 ended with a reward of 43.0, the average reward is 33.71 -Episode 532 ended with a reward of 23.0, the average reward is 33.44 -Episode 533 ended with a reward of 29.0, the average reward is 33.40 -Episode 534 ended with a reward of 30.0, the average reward is 33.43 -Episode 535 ended with a reward of 36.0, the average reward is 33.52 -Episode 536 ended with a reward of 36.0, the average reward is 33.66 -Episode 537 ended with a reward of 47.0, the average reward is 33.80 -Episode 538 ended with a reward of 42.0, the average reward is 33.89 -Episode 539 ended with a reward of 36.0, the average reward is 33.89 -Episode 540 ended with a reward of 39.0, the average reward is 34.05 -Episode 541 ended with a reward of 35.0, the average reward is 34.12 -Episode 542 ended with a reward of 40.0, the average reward is 34.24 -Episode 543 ended with a reward of 49.0, the average reward is 34.40 -Episode 544 ended with a reward of 39.0, the average reward is 34.53 -Episode 545 ended with a reward of 36.0, the average reward is 34.55 -Episode 546 ended with a reward of 34.0, the average reward is 34.57 -Episode 547 ended with a reward of 40.0, the average reward is 34.70 -Episode 548 ended with a reward of 33.0, the average reward is 34.65 -Episode 549 ended with a reward of 41.0, the average reward is 34.76 -Episode 550 ended with a reward of 34.0, the average reward is 34.80 -Episode 551 ended with a reward of 39.0, the average reward is 34.94 -Episode 552 ended with a reward of 35.0, the average reward is 35.04 -Episode 553 ended with a reward of 40.0, the average reward is 35.14 -Episode 554 ended with a reward of 37.0, the average reward is 35.12 -Episode 555 ended with a reward of 41.0, the average reward is 35.21 -Episode 556 ended with a reward of 27.0, the average reward is 35.22 -Episode 557 ended with a reward of 33.0, the average reward is 35.22 -Episode 558 ended with a reward of 42.0, the average reward is 35.07 -Episode 559 ended with a reward of 34.0, the average reward is 34.98 -Episode 560 ended with a reward of 34.0, the average reward is 35.02 -Episode 561 ended with a reward of 34.0, the average reward is 35.08 -Episode 562 ended with a reward of 30.0, the average reward is 35.01 -Episode 563 ended with a reward of 36.0, the average reward is 35.13 -Episode 564 ended with a reward of 34.0, the average reward is 35.14 -Episode 565 ended with a reward of 46.0, the average reward is 35.26 -Episode 566 ended with a reward of 36.0, the average reward is 35.15 -Episode 567 ended with a reward of 41.0, the average reward is 35.20 -Episode 568 ended with a reward of 21.0, the average reward is 34.98 -Episode 569 ended with a reward of 24.0, the average reward is 34.79 -Episode 570 ended with a reward of 38.0, the average reward is 34.92 -Episode 571 ended with a reward of 33.0, the average reward is 34.91 -Episode 572 ended with a reward of 38.0, the average reward is 34.86 -Episode 573 ended with a reward of 32.0, the average reward is 35.00 -Episode 574 ended with a reward of 37.0, the average reward is 35.06 -Episode 575 ended with a reward of 33.0, the average reward is 35.15 -Episode 576 ended with a reward of 27.0, the average reward is 34.91 -Episode 577 ended with a reward of 30.0, the average reward is 34.80 -Episode 578 ended with a reward of 41.0, the average reward is 34.72 -Episode 579 ended with a reward of 36.0, the average reward is 34.75 -Episode 580 ended with a reward of 39.0, the average reward is 34.86 -Episode 581 ended with a reward of 39.0, the average reward is 34.89 -Episode 582 ended with a reward of 46.0, the average reward is 34.98 -Episode 583 ended with a reward of 33.0, the average reward is 34.95 -Episode 584 ended with a reward of 33.0, the average reward is 35.00 -Episode 585 ended with a reward of 25.0, the average reward is 34.85 -Episode 586 ended with a reward of 38.0, the average reward is 34.85 -Episode 587 ended with a reward of 45.0, the average reward is 34.97 -Episode 588 ended with a reward of 36.0, the average reward is 34.86 -Episode 589 ended with a reward of 36.0, the average reward is 34.79 -Episode 590 ended with a reward of 30.0, the average reward is 34.64 -Episode 591 ended with a reward of 26.0, the average reward is 34.52 -Episode 592 ended with a reward of 37.0, the average reward is 34.62 -Episode 593 ended with a reward of 37.0, the average reward is 34.73 -Episode 594 ended with a reward of 18.0, the average reward is 34.58 -Episode 595 ended with a reward of 39.0, the average reward is 34.59 -Episode 596 ended with a reward of 37.0, the average reward is 34.73 -Episode 597 ended with a reward of 42.0, the average reward is 34.82 -Episode 598 ended with a reward of 33.0, the average reward is 34.90 -Episode 599 ended with a reward of 43.0, the average reward is 34.96 -Episode 600 ended with a reward of 34.0, the average reward is 35.03 -++++++ We have been running for 1691008/40000000.0 frames ++++++ -Episode 601 ended with a reward of 34.0, the average reward is 35.13 -Episode 602 ended with a reward of 43.0, the average reward is 35.21 -Episode 603 ended with a reward of 36.0, the average reward is 35.20 -Episode 604 ended with a reward of 44.0, the average reward is 35.35 -Episode 605 ended with a reward of 30.0, the average reward is 35.23 -Episode 606 ended with a reward of 36.0, the average reward is 35.22 -Episode 607 ended with a reward of 40.0, the average reward is 35.23 -Episode 608 ended with a reward of 40.0, the average reward is 35.38 -Episode 609 ended with a reward of 25.0, the average reward is 35.33 -Episode 610 ended with a reward of 36.0, the average reward is 35.35 -Episode 611 ended with a reward of 24.0, the average reward is 35.26 -Episode 612 ended with a reward of 32.0, the average reward is 35.27 -Episode 613 ended with a reward of 37.0, the average reward is 35.33 -Episode 614 ended with a reward of 40.0, the average reward is 35.43 -Episode 615 ended with a reward of 40.0, the average reward is 35.42 -Episode 616 ended with a reward of 45.0, the average reward is 35.53 -Episode 617 ended with a reward of 31.0, the average reward is 35.57 -Episode 618 ended with a reward of 57.0, the average reward is 35.90 -Episode 619 ended with a reward of 33.0, the average reward is 35.99 -Episode 620 ended with a reward of 37.0, the average reward is 36.11 -Episode 621 ended with a reward of 46.0, the average reward is 36.00 -Episode 622 ended with a reward of 24.0, the average reward is 35.87 -Episode 623 ended with a reward of 37.0, the average reward is 35.86 -Episode 624 ended with a reward of 28.0, the average reward is 35.84 -Episode 625 ended with a reward of 33.0, the average reward is 35.72 -Episode 626 ended with a reward of 36.0, the average reward is 35.71 -Episode 627 ended with a reward of 37.0, the average reward is 35.70 -Episode 628 ended with a reward of 34.0, the average reward is 35.77 -Episode 629 ended with a reward of 36.0, the average reward is 35.75 -Episode 630 ended with a reward of 28.0, the average reward is 35.76 -Episode 631 ended with a reward of 36.0, the average reward is 35.69 -Episode 632 ended with a reward of 37.0, the average reward is 35.83 -Episode 633 ended with a reward of 45.0, the average reward is 35.99 -Episode 634 ended with a reward of 37.0, the average reward is 36.06 -Episode 635 ended with a reward of 52.0, the average reward is 36.22 -Episode 636 ended with a reward of 44.0, the average reward is 36.30 -Episode 637 ended with a reward of 37.0, the average reward is 36.20 -Episode 638 ended with a reward of 54.0, the average reward is 36.32 -Episode 639 ended with a reward of 22.0, the average reward is 36.18 -Episode 640 ended with a reward of 45.0, the average reward is 36.24 -Episode 641 ended with a reward of 40.0, the average reward is 36.29 -Episode 642 ended with a reward of 27.0, the average reward is 36.16 -Episode 643 ended with a reward of 54.0, the average reward is 36.21 -Episode 644 ended with a reward of 47.0, the average reward is 36.29 -Episode 645 ended with a reward of 33.0, the average reward is 36.26 -Episode 646 ended with a reward of 38.0, the average reward is 36.30 -Episode 647 ended with a reward of 44.0, the average reward is 36.34 -Episode 648 ended with a reward of 40.0, the average reward is 36.41 -Episode 649 ended with a reward of 42.0, the average reward is 36.42 -Episode 650 ended with a reward of 52.0, the average reward is 36.60 -Episode 651 ended with a reward of 38.0, the average reward is 36.59 -Episode 652 ended with a reward of 43.0, the average reward is 36.67 -Episode 653 ended with a reward of 34.0, the average reward is 36.61 -Episode 654 ended with a reward of 37.0, the average reward is 36.61 -Episode 655 ended with a reward of 39.0, the average reward is 36.59 -Episode 656 ended with a reward of 42.0, the average reward is 36.74 -Episode 657 ended with a reward of 44.0, the average reward is 36.85 -Episode 658 ended with a reward of 40.0, the average reward is 36.83 -Episode 659 ended with a reward of 46.0, the average reward is 36.95 -Episode 660 ended with a reward of 28.0, the average reward is 36.89 -Episode 661 ended with a reward of 43.0, the average reward is 36.98 -Episode 662 ended with a reward of 39.0, the average reward is 37.07 -Episode 663 ended with a reward of 47.0, the average reward is 37.18 -Episode 664 ended with a reward of 44.0, the average reward is 37.28 -Episode 665 ended with a reward of 71.0, the average reward is 37.53 -Episode 666 ended with a reward of 40.0, the average reward is 37.57 -Episode 667 ended with a reward of 33.0, the average reward is 37.49 -Episode 668 ended with a reward of 36.0, the average reward is 37.64 -Episode 669 ended with a reward of 35.0, the average reward is 37.75 -Episode 670 ended with a reward of 40.0, the average reward is 37.77 -Episode 671 ended with a reward of 53.0, the average reward is 37.97 -Episode 672 ended with a reward of 19.0, the average reward is 37.78 -Episode 673 ended with a reward of 32.0, the average reward is 37.78 -Episode 674 ended with a reward of 59.0, the average reward is 38.00 -Episode 675 ended with a reward of 39.0, the average reward is 38.06 -Episode 676 ended with a reward of 48.0, the average reward is 38.27 -Episode 677 ended with a reward of 48.0, the average reward is 38.45 -Episode 678 ended with a reward of 46.0, the average reward is 38.50 -Episode 679 ended with a reward of 54.0, the average reward is 38.68 -Episode 680 ended with a reward of 47.0, the average reward is 38.76 -Episode 681 ended with a reward of 43.0, the average reward is 38.80 -Episode 682 ended with a reward of 45.0, the average reward is 38.79 -Episode 683 ended with a reward of 52.0, the average reward is 38.98 -Episode 684 ended with a reward of 45.0, the average reward is 39.10 -Episode 685 ended with a reward of 39.0, the average reward is 39.24 -Episode 686 ended with a reward of 50.0, the average reward is 39.36 -Episode 687 ended with a reward of 72.0, the average reward is 39.63 -Episode 688 ended with a reward of 40.0, the average reward is 39.67 -Episode 689 ended with a reward of 42.0, the average reward is 39.73 -Episode 690 ended with a reward of 48.0, the average reward is 39.91 -Episode 691 ended with a reward of 32.0, the average reward is 39.97 -Episode 692 ended with a reward of 41.0, the average reward is 40.01 -Episode 693 ended with a reward of 39.0, the average reward is 40.03 -Episode 694 ended with a reward of 44.0, the average reward is 40.29 -Episode 695 ended with a reward of 47.0, the average reward is 40.37 -Episode 696 ended with a reward of 42.0, the average reward is 40.42 -Episode 697 ended with a reward of 44.0, the average reward is 40.44 -Episode 698 ended with a reward of 34.0, the average reward is 40.45 -Episode 699 ended with a reward of 39.0, the average reward is 40.41 -Episode 700 ended with a reward of 44.0, the average reward is 40.51 -++++++ We have been running for 1974016/40000000.0 frames ++++++ -Episode 701 ended with a reward of 45.0, the average reward is 40.62 -Episode 702 ended with a reward of 55.0, the average reward is 40.74 -Episode 703 ended with a reward of 36.0, the average reward is 40.74 -Episode 704 ended with a reward of 46.0, the average reward is 40.76 -Episode 705 ended with a reward of 56.0, the average reward is 41.02 -Episode 706 ended with a reward of 88.0, the average reward is 41.54 -Episode 707 ended with a reward of 43.0, the average reward is 41.57 -Episode 708 ended with a reward of 40.0, the average reward is 41.57 -Episode 709 ended with a reward of 55.0, the average reward is 41.87 -Episode 710 ended with a reward of 50.0, the average reward is 42.01 -Episode 711 ended with a reward of 53.0, the average reward is 42.30 -Episode 712 ended with a reward of 52.0, the average reward is 42.50 -Episode 713 ended with a reward of 42.0, the average reward is 42.55 -Episode 714 ended with a reward of 32.0, the average reward is 42.47 -Episode 715 ended with a reward of 47.0, the average reward is 42.54 -Episode 716 ended with a reward of 48.0, the average reward is 42.57 -Episode 717 ended with a reward of 78.0, the average reward is 43.04 -Episode 718 ended with a reward of 25.0, the average reward is 42.72 -Episode 719 ended with a reward of 58.0, the average reward is 42.97 -Episode 720 ended with a reward of 56.0, the average reward is 43.16 -Episode 721 ended with a reward of 53.0, the average reward is 43.23 -Episode 722 ended with a reward of 45.0, the average reward is 43.44 -Episode 723 ended with a reward of 55.0, the average reward is 43.62 -Episode 724 ended with a reward of 50.0, the average reward is 43.84 -Episode 725 ended with a reward of 42.0, the average reward is 43.93 -Episode 726 ended with a reward of 54.0, the average reward is 44.11 -Episode 727 ended with a reward of 52.0, the average reward is 44.26 -Episode 728 ended with a reward of 56.0, the average reward is 44.48 -Episode 729 ended with a reward of 58.0, the average reward is 44.70 -Episode 730 ended with a reward of 49.0, the average reward is 44.91 -Episode 731 ended with a reward of 58.0, the average reward is 45.13 -Episode 732 ended with a reward of 36.0, the average reward is 45.12 -Episode 733 ended with a reward of 43.0, the average reward is 45.10 -Episode 734 ended with a reward of 46.0, the average reward is 45.19 -Episode 735 ended with a reward of 42.0, the average reward is 45.09 -Episode 736 ended with a reward of 55.0, the average reward is 45.20 -Episode 737 ended with a reward of 53.0, the average reward is 45.36 -Episode 738 ended with a reward of 57.0, the average reward is 45.39 -Episode 739 ended with a reward of 53.0, the average reward is 45.70 -Episode 740 ended with a reward of 45.0, the average reward is 45.70 -Episode 741 ended with a reward of 51.0, the average reward is 45.81 -Episode 742 ended with a reward of 54.0, the average reward is 46.08 -Episode 743 ended with a reward of 45.0, the average reward is 45.99 -Episode 744 ended with a reward of 51.0, the average reward is 46.03 -Episode 745 ended with a reward of 51.0, the average reward is 46.21 -Episode 746 ended with a reward of 41.0, the average reward is 46.24 -Episode 747 ended with a reward of 39.0, the average reward is 46.19 -Episode 748 ended with a reward of 47.0, the average reward is 46.26 -Episode 749 ended with a reward of 49.0, the average reward is 46.33 -Episode 750 ended with a reward of 42.0, the average reward is 46.23 -Episode 751 ended with a reward of 47.0, the average reward is 46.32 -Episode 752 ended with a reward of 48.0, the average reward is 46.37 -Episode 753 ended with a reward of 54.0, the average reward is 46.57 -Episode 754 ended with a reward of 49.0, the average reward is 46.69 -Episode 755 ended with a reward of 46.0, the average reward is 46.76 -Episode 756 ended with a reward of 48.0, the average reward is 46.82 -Episode 757 ended with a reward of 50.0, the average reward is 46.88 -Episode 758 ended with a reward of 49.0, the average reward is 46.97 -Episode 759 ended with a reward of 62.0, the average reward is 47.13 -Episode 760 ended with a reward of 41.0, the average reward is 47.26 -Episode 761 ended with a reward of 49.0, the average reward is 47.32 -Episode 762 ended with a reward of 57.0, the average reward is 47.50 -Episode 763 ended with a reward of 62.0, the average reward is 47.65 -Episode 764 ended with a reward of 52.0, the average reward is 47.73 -Episode 765 ended with a reward of 62.0, the average reward is 47.64 -Episode 766 ended with a reward of 56.0, the average reward is 47.80 -Episode 767 ended with a reward of 44.0, the average reward is 47.91 -Episode 768 ended with a reward of 49.0, the average reward is 48.04 -Episode 769 ended with a reward of 52.0, the average reward is 48.21 -Episode 770 ended with a reward of 49.0, the average reward is 48.30 -Episode 771 ended with a reward of 54.0, the average reward is 48.31 -Episode 772 ended with a reward of 50.0, the average reward is 48.62 -Episode 773 ended with a reward of 73.0, the average reward is 49.03 -Episode 774 ended with a reward of 70.0, the average reward is 49.14 -Episode 775 ended with a reward of 54.0, the average reward is 49.29 -Episode 776 ended with a reward of 44.0, the average reward is 49.25 -Episode 777 ended with a reward of 77.0, the average reward is 49.54 -Episode 778 ended with a reward of 44.0, the average reward is 49.52 -Episode 779 ended with a reward of 62.0, the average reward is 49.60 -Episode 780 ended with a reward of 60.0, the average reward is 49.73 -Episode 781 ended with a reward of 48.0, the average reward is 49.78 -Episode 782 ended with a reward of 59.0, the average reward is 49.92 -Episode 783 ended with a reward of 52.0, the average reward is 49.92 -Episode 784 ended with a reward of 44.0, the average reward is 49.91 -Solved after 785 episodes. The last episode ended with a reward of 52.0, the average reward was 50.04 -+++++++++++++ TRAINING STOPPED: WE REACHED THE REWARD THRESHOLD +++++++++++++ -CPU times: user 21h 11min 56s, sys: 26min 19s, total: 21h 38min 15s -Wall time: 21h 4min 47s --
Now that we have reached the target that we had set, we save the model and iteratively try to see how much we can improve before things start to go wrong.
-Note: with #train
we mean the code within the try/except
statement in the previous cell.
model.save("ppo_bowling.h5")
-
Let us try to improve the average reward from 50 to 60.
- -target_reached = False
-reward_threshold = 60
-current_episode_reward = 0
-
-#train
-
Episode 786 ended with a reward of 107.0, the average reward is 50.61 -Episode 787 ended with a reward of 44.0, the average reward is 50.33 -Episode 788 ended with a reward of 52.0, the average reward is 50.45 -Episode 789 ended with a reward of 45.0, the average reward is 50.48 -Episode 790 ended with a reward of 69.0, the average reward is 50.69 -Episode 791 ended with a reward of 77.0, the average reward is 51.14 -Episode 792 ended with a reward of 69.0, the average reward is 51.42 -Episode 793 ended with a reward of 74.0, the average reward is 51.77 -Episode 794 ended with a reward of 48.0, the average reward is 51.81 -Episode 795 ended with a reward of 53.0, the average reward is 51.87 -Episode 796 ended with a reward of 52.0, the average reward is 51.97 -Episode 797 ended with a reward of 59.0, the average reward is 52.12 -Episode 798 ended with a reward of 52.0, the average reward is 52.30 -Episode 799 ended with a reward of 59.0, the average reward is 52.50 -Episode 800 ended with a reward of 48.0, the average reward is 52.54 -++++++ We have been running for 2255104/40000000.0 frames ++++++ -Episode 801 ended with a reward of 54.0, the average reward is 52.63 -Episode 802 ended with a reward of 61.0, the average reward is 52.69 -Episode 803 ended with a reward of 42.0, the average reward is 52.75 -Episode 804 ended with a reward of 62.0, the average reward is 52.91 -Episode 805 ended with a reward of 55.0, the average reward is 52.90 -Episode 806 ended with a reward of 66.0, the average reward is 52.68 -Episode 807 ended with a reward of 39.0, the average reward is 52.64 -Episode 808 ended with a reward of 56.0, the average reward is 52.80 -Episode 809 ended with a reward of 62.0, the average reward is 52.87 -Episode 810 ended with a reward of 49.0, the average reward is 52.86 -Episode 811 ended with a reward of 66.0, the average reward is 52.99 -Episode 812 ended with a reward of 53.0, the average reward is 53.00 -Episode 813 ended with a reward of 46.0, the average reward is 53.04 -Episode 814 ended with a reward of 66.0, the average reward is 53.38 -Episode 815 ended with a reward of 57.0, the average reward is 53.48 -Episode 816 ended with a reward of 48.0, the average reward is 53.48 -Episode 817 ended with a reward of 52.0, the average reward is 53.22 -Episode 818 ended with a reward of 68.0, the average reward is 53.65 -Episode 819 ended with a reward of 64.0, the average reward is 53.71 -Episode 820 ended with a reward of 42.0, the average reward is 53.57 -Episode 821 ended with a reward of 62.0, the average reward is 53.66 -Episode 822 ended with a reward of 56.0, the average reward is 53.77 -Episode 823 ended with a reward of 33.0, the average reward is 53.55 -Episode 824 ended with a reward of 59.0, the average reward is 53.64 -Episode 825 ended with a reward of 44.0, the average reward is 53.66 -Episode 826 ended with a reward of 55.0, the average reward is 53.67 -Episode 827 ended with a reward of 60.0, the average reward is 53.75 -Episode 828 ended with a reward of 43.0, the average reward is 53.62 -Episode 829 ended with a reward of 49.0, the average reward is 53.53 -Episode 830 ended with a reward of 74.0, the average reward is 53.78 -Episode 831 ended with a reward of 46.0, the average reward is 53.66 -Episode 832 ended with a reward of 71.0, the average reward is 54.01 -Episode 833 ended with a reward of 49.0, the average reward is 54.07 -Episode 834 ended with a reward of 49.0, the average reward is 54.10 -Episode 835 ended with a reward of 85.0, the average reward is 54.53 -Episode 836 ended with a reward of 54.0, the average reward is 54.52 -Episode 837 ended with a reward of 63.0, the average reward is 54.62 -Episode 838 ended with a reward of 65.0, the average reward is 54.70 -Episode 839 ended with a reward of 49.0, the average reward is 54.66 -Episode 840 ended with a reward of 61.0, the average reward is 54.82 -Episode 841 ended with a reward of 50.0, the average reward is 54.81 -Episode 842 ended with a reward of 55.0, the average reward is 54.82 -Episode 843 ended with a reward of 46.0, the average reward is 54.83 -Episode 844 ended with a reward of 64.0, the average reward is 54.96 -Episode 845 ended with a reward of 33.0, the average reward is 54.78 -Episode 846 ended with a reward of 49.0, the average reward is 54.86 -Episode 847 ended with a reward of 63.0, the average reward is 55.10 -Episode 848 ended with a reward of 49.0, the average reward is 55.12 -Episode 849 ended with a reward of 56.0, the average reward is 55.19 -Episode 850 ended with a reward of 44.0, the average reward is 55.21 -Episode 851 ended with a reward of 67.0, the average reward is 55.41 -Episode 852 ended with a reward of 52.0, the average reward is 55.45 -Episode 853 ended with a reward of 53.0, the average reward is 55.44 -Episode 854 ended with a reward of 43.0, the average reward is 55.38 -Episode 855 ended with a reward of 51.0, the average reward is 55.43 -Episode 856 ended with a reward of 54.0, the average reward is 55.49 -Episode 857 ended with a reward of 47.0, the average reward is 55.46 -Episode 858 ended with a reward of 63.0, the average reward is 55.60 -Episode 859 ended with a reward of 67.0, the average reward is 55.65 -Episode 860 ended with a reward of 77.0, the average reward is 56.01 -Episode 861 ended with a reward of 48.0, the average reward is 56.00 -Episode 862 ended with a reward of 59.0, the average reward is 56.02 -Episode 863 ended with a reward of 42.0, the average reward is 55.82 -Episode 864 ended with a reward of 45.0, the average reward is 55.75 -Episode 865 ended with a reward of 53.0, the average reward is 55.66 -Episode 866 ended with a reward of 60.0, the average reward is 55.70 -Episode 867 ended with a reward of 51.0, the average reward is 55.77 -Episode 868 ended with a reward of 51.0, the average reward is 55.79 -Episode 869 ended with a reward of 54.0, the average reward is 55.81 -Episode 870 ended with a reward of 50.0, the average reward is 55.82 -Episode 871 ended with a reward of 64.0, the average reward is 55.92 -Episode 872 ended with a reward of 44.0, the average reward is 55.86 -Episode 873 ended with a reward of 45.0, the average reward is 55.58 -Episode 874 ended with a reward of 54.0, the average reward is 55.42 -Episode 875 ended with a reward of 61.0, the average reward is 55.49 -Episode 876 ended with a reward of 48.0, the average reward is 55.53 -Episode 877 ended with a reward of 53.0, the average reward is 55.29 -Episode 878 ended with a reward of 50.0, the average reward is 55.35 -Episode 879 ended with a reward of 50.0, the average reward is 55.23 -Episode 880 ended with a reward of 64.0, the average reward is 55.27 -Episode 881 ended with a reward of 45.0, the average reward is 55.24 -Episode 882 ended with a reward of 48.0, the average reward is 55.13 -Episode 883 ended with a reward of 71.0, the average reward is 55.32 -Episode 884 ended with a reward of 45.0, the average reward is 55.33 -Episode 885 ended with a reward of 53.0, the average reward is 55.34 -Episode 886 ended with a reward of 44.0, the average reward is 54.71 -Episode 887 ended with a reward of 60.0, the average reward is 54.87 -Episode 888 ended with a reward of 48.0, the average reward is 54.83 -Episode 889 ended with a reward of 59.0, the average reward is 54.97 -Episode 890 ended with a reward of 49.0, the average reward is 54.77 -Episode 891 ended with a reward of 50.0, the average reward is 54.50 -Episode 892 ended with a reward of 48.0, the average reward is 54.29 -Episode 893 ended with a reward of 54.0, the average reward is 54.09 -Episode 894 ended with a reward of 47.0, the average reward is 54.08 -Episode 895 ended with a reward of 40.0, the average reward is 53.95 -Episode 896 ended with a reward of 39.0, the average reward is 53.82 -Episode 897 ended with a reward of 65.0, the average reward is 53.88 -Episode 898 ended with a reward of 46.0, the average reward is 53.82 -Episode 899 ended with a reward of 57.0, the average reward is 53.80 -Episode 900 ended with a reward of 65.0, the average reward is 53.97 -++++++ We have been running for 2539136/40000000.0 frames ++++++ -Episode 901 ended with a reward of 61.0, the average reward is 54.04 -Episode 902 ended with a reward of 58.0, the average reward is 54.01 -Episode 903 ended with a reward of 57.0, the average reward is 54.16 -Episode 904 ended with a reward of 56.0, the average reward is 54.10 -Episode 905 ended with a reward of 72.0, the average reward is 54.27 -Episode 906 ended with a reward of 45.0, the average reward is 54.06 -Episode 907 ended with a reward of 61.0, the average reward is 54.28 -Episode 908 ended with a reward of 53.0, the average reward is 54.25 -Episode 909 ended with a reward of 46.0, the average reward is 54.09 -Episode 910 ended with a reward of 55.0, the average reward is 54.15 -Episode 911 ended with a reward of 52.0, the average reward is 54.01 -Episode 912 ended with a reward of 61.0, the average reward is 54.09 -Episode 913 ended with a reward of 39.0, the average reward is 54.02 -Episode 914 ended with a reward of 46.0, the average reward is 53.82 -Episode 915 ended with a reward of 67.0, the average reward is 53.92 -Episode 916 ended with a reward of 54.0, the average reward is 53.98 -Episode 917 ended with a reward of 46.0, the average reward is 53.92 -Episode 918 ended with a reward of 71.0, the average reward is 53.95 -Episode 919 ended with a reward of 57.0, the average reward is 53.88 -Episode 920 ended with a reward of 51.0, the average reward is 53.97 -Episode 921 ended with a reward of 53.0, the average reward is 53.88 -Episode 922 ended with a reward of 54.0, the average reward is 53.86 -Episode 923 ended with a reward of 58.0, the average reward is 54.11 -Episode 924 ended with a reward of 55.0, the average reward is 54.07 -Episode 925 ended with a reward of 77.0, the average reward is 54.40 -Episode 926 ended with a reward of 52.0, the average reward is 54.37 -Episode 927 ended with a reward of 50.0, the average reward is 54.27 -Episode 928 ended with a reward of 52.0, the average reward is 54.36 -Episode 929 ended with a reward of 76.0, the average reward is 54.63 -Episode 930 ended with a reward of 40.0, the average reward is 54.29 -Episode 931 ended with a reward of 40.0, the average reward is 54.23 -Episode 932 ended with a reward of 43.0, the average reward is 53.95 -Episode 933 ended with a reward of 68.0, the average reward is 54.14 -Episode 934 ended with a reward of 51.0, the average reward is 54.16 -Episode 935 ended with a reward of 63.0, the average reward is 53.94 -Episode 936 ended with a reward of 53.0, the average reward is 53.93 -Episode 937 ended with a reward of 64.0, the average reward is 53.94 -Episode 938 ended with a reward of 61.0, the average reward is 53.90 -Episode 939 ended with a reward of 70.0, the average reward is 54.11 -Episode 940 ended with a reward of 48.0, the average reward is 53.98 -Episode 941 ended with a reward of 57.0, the average reward is 54.05 -Episode 942 ended with a reward of 53.0, the average reward is 54.03 -Episode 943 ended with a reward of 49.0, the average reward is 54.06 -Episode 944 ended with a reward of 55.0, the average reward is 53.97 -Episode 945 ended with a reward of 48.0, the average reward is 54.12 -Episode 946 ended with a reward of 45.0, the average reward is 54.08 -Episode 947 ended with a reward of 53.0, the average reward is 53.98 -Episode 948 ended with a reward of 48.0, the average reward is 53.97 -Episode 949 ended with a reward of 40.0, the average reward is 53.81 -Episode 950 ended with a reward of 54.0, the average reward is 53.91 -Episode 951 ended with a reward of 48.0, the average reward is 53.72 -Episode 952 ended with a reward of 55.0, the average reward is 53.75 -Episode 953 ended with a reward of 55.0, the average reward is 53.77 -Episode 954 ended with a reward of 63.0, the average reward is 53.97 -Episode 955 ended with a reward of 67.0, the average reward is 54.13 -Episode 956 ended with a reward of 36.0, the average reward is 53.95 -Episode 957 ended with a reward of 53.0, the average reward is 54.01 -Episode 958 ended with a reward of 36.0, the average reward is 53.74 -Episode 959 ended with a reward of 43.0, the average reward is 53.50 -Episode 960 ended with a reward of 46.0, the average reward is 53.19 -Episode 961 ended with a reward of 51.0, the average reward is 53.22 -Episode 962 ended with a reward of 71.0, the average reward is 53.34 -Episode 963 ended with a reward of 64.0, the average reward is 53.56 -Episode 964 ended with a reward of 47.0, the average reward is 53.58 -Episode 965 ended with a reward of 56.0, the average reward is 53.61 -Episode 966 ended with a reward of 49.0, the average reward is 53.50 -Episode 967 ended with a reward of 46.0, the average reward is 53.45 -Episode 968 ended with a reward of 46.0, the average reward is 53.40 -Episode 969 ended with a reward of 63.0, the average reward is 53.49 -Episode 970 ended with a reward of 43.0, the average reward is 53.42 -Episode 971 ended with a reward of 67.0, the average reward is 53.45 -Episode 972 ended with a reward of 51.0, the average reward is 53.52 -Episode 973 ended with a reward of 57.0, the average reward is 53.64 -Episode 974 ended with a reward of 56.0, the average reward is 53.66 -Episode 975 ended with a reward of 65.0, the average reward is 53.70 -Episode 976 ended with a reward of 57.0, the average reward is 53.79 -Episode 977 ended with a reward of 59.0, the average reward is 53.85 -Episode 978 ended with a reward of 59.0, the average reward is 53.94 -Episode 979 ended with a reward of 44.0, the average reward is 53.88 -Episode 980 ended with a reward of 60.0, the average reward is 53.84 -Episode 981 ended with a reward of 51.0, the average reward is 53.90 -Episode 982 ended with a reward of 62.0, the average reward is 54.04 -Episode 983 ended with a reward of 60.0, the average reward is 53.93 -Episode 984 ended with a reward of 56.0, the average reward is 54.04 -Episode 985 ended with a reward of 55.0, the average reward is 54.06 -Episode 986 ended with a reward of 46.0, the average reward is 54.08 -Episode 987 ended with a reward of 61.0, the average reward is 54.09 -Episode 988 ended with a reward of 63.0, the average reward is 54.24 -Episode 989 ended with a reward of 58.0, the average reward is 54.23 -Episode 990 ended with a reward of 57.0, the average reward is 54.31 -Episode 991 ended with a reward of 77.0, the average reward is 54.58 -Episode 992 ended with a reward of 50.0, the average reward is 54.60 -Episode 993 ended with a reward of 70.0, the average reward is 54.76 -Episode 994 ended with a reward of 46.0, the average reward is 54.75 -Episode 995 ended with a reward of 65.0, the average reward is 55.00 -Episode 996 ended with a reward of 53.0, the average reward is 55.14 -Episode 997 ended with a reward of 70.0, the average reward is 55.19 -Episode 998 ended with a reward of 56.0, the average reward is 55.29 -Episode 999 ended with a reward of 46.0, the average reward is 55.18 -Episode 1000 ended with a reward of 60.0, the average reward is 55.13 -++++++ We have been running for 2823168/40000000.0 frames ++++++ -Episode 1001 ended with a reward of 61.0, the average reward is 55.13 -Episode 1002 ended with a reward of 52.0, the average reward is 55.07 -Episode 1003 ended with a reward of 57.0, the average reward is 55.07 -Episode 1004 ended with a reward of 49.0, the average reward is 55.00 -Episode 1005 ended with a reward of 59.0, the average reward is 54.87 -Episode 1006 ended with a reward of 52.0, the average reward is 54.94 -Episode 1007 ended with a reward of 64.0, the average reward is 54.97 -Episode 1008 ended with a reward of 60.0, the average reward is 55.04 -Episode 1009 ended with a reward of 82.0, the average reward is 55.40 -Episode 1010 ended with a reward of 51.0, the average reward is 55.36 -Episode 1011 ended with a reward of 56.0, the average reward is 55.40 -Episode 1012 ended with a reward of 68.0, the average reward is 55.47 -Episode 1013 ended with a reward of 52.0, the average reward is 55.60 -Episode 1014 ended with a reward of 47.0, the average reward is 55.61 -Episode 1015 ended with a reward of 76.0, the average reward is 55.70 -Episode 1016 ended with a reward of 50.0, the average reward is 55.66 -Episode 1017 ended with a reward of 55.0, the average reward is 55.75 -Episode 1018 ended with a reward of 68.0, the average reward is 55.72 -Episode 1019 ended with a reward of 49.0, the average reward is 55.64 -Episode 1020 ended with a reward of 53.0, the average reward is 55.66 -Episode 1021 ended with a reward of 47.0, the average reward is 55.60 -Episode 1022 ended with a reward of 43.0, the average reward is 55.49 -Episode 1023 ended with a reward of 45.0, the average reward is 55.36 -Episode 1024 ended with a reward of 50.0, the average reward is 55.31 -Episode 1025 ended with a reward of 50.0, the average reward is 55.04 -Episode 1026 ended with a reward of 48.0, the average reward is 55.00 -Episode 1027 ended with a reward of 58.0, the average reward is 55.08 -Episode 1028 ended with a reward of 48.0, the average reward is 55.04 -Episode 1029 ended with a reward of 51.0, the average reward is 54.79 -Episode 1030 ended with a reward of 49.0, the average reward is 54.88 -Episode 1031 ended with a reward of 68.0, the average reward is 55.16 -Episode 1032 ended with a reward of 60.0, the average reward is 55.33 -Episode 1033 ended with a reward of 51.0, the average reward is 55.16 -Episode 1034 ended with a reward of 63.0, the average reward is 55.28 -Episode 1035 ended with a reward of 58.0, the average reward is 55.23 -Episode 1036 ended with a reward of 43.0, the average reward is 55.13 -Episode 1037 ended with a reward of 55.0, the average reward is 55.04 -Episode 1038 ended with a reward of 68.0, the average reward is 55.11 -Episode 1039 ended with a reward of 49.0, the average reward is 54.90 -Episode 1040 ended with a reward of 106.0, the average reward is 55.48 -Episode 1041 ended with a reward of 57.0, the average reward is 55.48 -Episode 1042 ended with a reward of 52.0, the average reward is 55.47 -Episode 1043 ended with a reward of 59.0, the average reward is 55.57 -Episode 1044 ended with a reward of 53.0, the average reward is 55.55 -Episode 1045 ended with a reward of 48.0, the average reward is 55.55 -Episode 1046 ended with a reward of 61.0, the average reward is 55.71 -Episode 1047 ended with a reward of 56.0, the average reward is 55.74 -Episode 1048 ended with a reward of 46.0, the average reward is 55.72 -Episode 1049 ended with a reward of 63.0, the average reward is 55.95 -Episode 1050 ended with a reward of 52.0, the average reward is 55.93 -Episode 1051 ended with a reward of 72.0, the average reward is 56.17 -Episode 1052 ended with a reward of 55.0, the average reward is 56.17 -Episode 1053 ended with a reward of 74.0, the average reward is 56.36 -Episode 1054 ended with a reward of 59.0, the average reward is 56.32 -Episode 1055 ended with a reward of 52.0, the average reward is 56.17 -Episode 1056 ended with a reward of 61.0, the average reward is 56.42 -Episode 1057 ended with a reward of 57.0, the average reward is 56.46 -Episode 1058 ended with a reward of 56.0, the average reward is 56.66 -Episode 1059 ended with a reward of 49.0, the average reward is 56.72 -Episode 1060 ended with a reward of 53.0, the average reward is 56.79 -Episode 1061 ended with a reward of 57.0, the average reward is 56.85 -Episode 1062 ended with a reward of 65.0, the average reward is 56.79 -Episode 1063 ended with a reward of 59.0, the average reward is 56.74 -Episode 1064 ended with a reward of 46.0, the average reward is 56.73 -Episode 1065 ended with a reward of 59.0, the average reward is 56.76 -Episode 1066 ended with a reward of 59.0, the average reward is 56.86 -Episode 1067 ended with a reward of 54.0, the average reward is 56.94 -Episode 1068 ended with a reward of 56.0, the average reward is 57.04 -Episode 1069 ended with a reward of 70.0, the average reward is 57.11 -Episode 1070 ended with a reward of 51.0, the average reward is 57.19 -Episode 1071 ended with a reward of 48.0, the average reward is 57.00 -Episode 1072 ended with a reward of 75.0, the average reward is 57.24 -Episode 1073 ended with a reward of 87.0, the average reward is 57.54 -Episode 1074 ended with a reward of 55.0, the average reward is 57.53 -Episode 1075 ended with a reward of 55.0, the average reward is 57.43 -Episode 1076 ended with a reward of 51.0, the average reward is 57.37 -Episode 1077 ended with a reward of 72.0, the average reward is 57.50 -Episode 1078 ended with a reward of 56.0, the average reward is 57.47 -Episode 1079 ended with a reward of 64.0, the average reward is 57.67 -Episode 1080 ended with a reward of 78.0, the average reward is 57.85 -Episode 1081 ended with a reward of 49.0, the average reward is 57.83 -Episode 1082 ended with a reward of 59.0, the average reward is 57.80 -Episode 1083 ended with a reward of 57.0, the average reward is 57.77 -Episode 1084 ended with a reward of 51.0, the average reward is 57.72 -Episode 1085 ended with a reward of 60.0, the average reward is 57.77 -Episode 1086 ended with a reward of 43.0, the average reward is 57.74 -Episode 1087 ended with a reward of 45.0, the average reward is 57.58 -Episode 1088 ended with a reward of 37.0, the average reward is 57.32 -Episode 1089 ended with a reward of 58.0, the average reward is 57.32 -Episode 1090 ended with a reward of 51.0, the average reward is 57.26 -Episode 1091 ended with a reward of 59.0, the average reward is 57.08 -Episode 1092 ended with a reward of 53.0, the average reward is 57.11 -Episode 1093 ended with a reward of 40.0, the average reward is 56.81 -Episode 1094 ended with a reward of 60.0, the average reward is 56.95 -Episode 1095 ended with a reward of 46.0, the average reward is 56.76 -Episode 1096 ended with a reward of 50.0, the average reward is 56.73 -Episode 1097 ended with a reward of 45.0, the average reward is 56.48 -Episode 1098 ended with a reward of 44.0, the average reward is 56.36 -Episode 1099 ended with a reward of 53.0, the average reward is 56.43 -Episode 1100 ended with a reward of 49.0, the average reward is 56.32 -++++++ We have been running for 3106944/40000000.0 frames ++++++ -Episode 1101 ended with a reward of 68.0, the average reward is 56.39 -Episode 1102 ended with a reward of 47.0, the average reward is 56.34 -Episode 1103 ended with a reward of 45.0, the average reward is 56.22 -Episode 1104 ended with a reward of 60.0, the average reward is 56.33 -Episode 1105 ended with a reward of 37.0, the average reward is 56.11 -Episode 1106 ended with a reward of 56.0, the average reward is 56.15 -Episode 1107 ended with a reward of 55.0, the average reward is 56.06 -Episode 1108 ended with a reward of 54.0, the average reward is 56.00 -Episode 1109 ended with a reward of 51.0, the average reward is 55.69 -Episode 1110 ended with a reward of 56.0, the average reward is 55.74 -Episode 1111 ended with a reward of 60.0, the average reward is 55.78 -Episode 1112 ended with a reward of 46.0, the average reward is 55.56 -Episode 1113 ended with a reward of 57.0, the average reward is 55.61 -Episode 1114 ended with a reward of 56.0, the average reward is 55.70 -Episode 1115 ended with a reward of 50.0, the average reward is 55.44 -Episode 1116 ended with a reward of 50.0, the average reward is 55.44 -Episode 1117 ended with a reward of 46.0, the average reward is 55.35 -Episode 1118 ended with a reward of 46.0, the average reward is 55.13 -Episode 1119 ended with a reward of 53.0, the average reward is 55.17 -Episode 1120 ended with a reward of 55.0, the average reward is 55.19 -Episode 1121 ended with a reward of 39.0, the average reward is 55.11 -Episode 1122 ended with a reward of 65.0, the average reward is 55.33 -Episode 1123 ended with a reward of 67.0, the average reward is 55.55 -Episode 1124 ended with a reward of 55.0, the average reward is 55.60 -Episode 1125 ended with a reward of 52.0, the average reward is 55.62 -Episode 1126 ended with a reward of 64.0, the average reward is 55.78 -Episode 1127 ended with a reward of 68.0, the average reward is 55.88 -Episode 1128 ended with a reward of 49.0, the average reward is 55.89 -Episode 1129 ended with a reward of 65.0, the average reward is 56.03 -Episode 1130 ended with a reward of 62.0, the average reward is 56.16 -Episode 1131 ended with a reward of 72.0, the average reward is 56.20 -Episode 1132 ended with a reward of 42.0, the average reward is 56.02 -Episode 1133 ended with a reward of 54.0, the average reward is 56.05 -Episode 1134 ended with a reward of 54.0, the average reward is 55.96 -Episode 1135 ended with a reward of 49.0, the average reward is 55.87 -Episode 1136 ended with a reward of 60.0, the average reward is 56.04 -Episode 1137 ended with a reward of 40.0, the average reward is 55.89 -Episode 1138 ended with a reward of 54.0, the average reward is 55.75 -Episode 1139 ended with a reward of 62.0, the average reward is 55.88 -Episode 1140 ended with a reward of 62.0, the average reward is 55.44 -Episode 1141 ended with a reward of 61.0, the average reward is 55.48 -Episode 1142 ended with a reward of 64.0, the average reward is 55.60 -Episode 1143 ended with a reward of 58.0, the average reward is 55.59 -Episode 1144 ended with a reward of 53.0, the average reward is 55.59 -Episode 1145 ended with a reward of 79.0, the average reward is 55.90 -Episode 1146 ended with a reward of 66.0, the average reward is 55.95 -Episode 1147 ended with a reward of 62.0, the average reward is 56.01 -Episode 1148 ended with a reward of 55.0, the average reward is 56.10 -Episode 1149 ended with a reward of 49.0, the average reward is 55.96 -Episode 1150 ended with a reward of 55.0, the average reward is 55.99 -Episode 1151 ended with a reward of 51.0, the average reward is 55.78 -Episode 1152 ended with a reward of 41.0, the average reward is 55.64 -Episode 1153 ended with a reward of 103.0, the average reward is 55.93 -Episode 1154 ended with a reward of 55.0, the average reward is 55.89 -Episode 1155 ended with a reward of 53.0, the average reward is 55.90 -Episode 1156 ended with a reward of 70.0, the average reward is 55.99 -Episode 1157 ended with a reward of 67.0, the average reward is 56.09 -Episode 1158 ended with a reward of 63.0, the average reward is 56.16 -Episode 1159 ended with a reward of 68.0, the average reward is 56.35 -Episode 1160 ended with a reward of 67.0, the average reward is 56.49 -Episode 1161 ended with a reward of 74.0, the average reward is 56.66 -Episode 1162 ended with a reward of 59.0, the average reward is 56.60 -Episode 1163 ended with a reward of 57.0, the average reward is 56.58 -Episode 1164 ended with a reward of 77.0, the average reward is 56.89 -Episode 1165 ended with a reward of 72.0, the average reward is 57.02 -Episode 1166 ended with a reward of 94.0, the average reward is 57.37 -Episode 1167 ended with a reward of 80.0, the average reward is 57.63 -Episode 1168 ended with a reward of 80.0, the average reward is 57.87 -Episode 1169 ended with a reward of 53.0, the average reward is 57.70 -Episode 1170 ended with a reward of 63.0, the average reward is 57.82 -Episode 1171 ended with a reward of 89.0, the average reward is 58.23 -Episode 1172 ended with a reward of 62.0, the average reward is 58.10 -Episode 1173 ended with a reward of 77.0, the average reward is 58.00 -Episode 1174 ended with a reward of 55.0, the average reward is 58.00 -Episode 1175 ended with a reward of 70.0, the average reward is 58.15 -Episode 1176 ended with a reward of 64.0, the average reward is 58.28 -Episode 1177 ended with a reward of 69.0, the average reward is 58.25 -Episode 1178 ended with a reward of 70.0, the average reward is 58.39 -Episode 1179 ended with a reward of 64.0, the average reward is 58.39 -Episode 1180 ended with a reward of 73.0, the average reward is 58.34 -Episode 1181 ended with a reward of 58.0, the average reward is 58.43 -Episode 1182 ended with a reward of 60.0, the average reward is 58.44 -Episode 1183 ended with a reward of 45.0, the average reward is 58.32 -Episode 1184 ended with a reward of 66.0, the average reward is 58.47 -Episode 1185 ended with a reward of 77.0, the average reward is 58.64 -Episode 1186 ended with a reward of 60.0, the average reward is 58.81 -Episode 1187 ended with a reward of 64.0, the average reward is 59.00 -Episode 1188 ended with a reward of 72.0, the average reward is 59.35 -Episode 1189 ended with a reward of 76.0, the average reward is 59.53 -Episode 1190 ended with a reward of 73.0, the average reward is 59.75 -Episode 1191 ended with a reward of 65.0, the average reward is 59.81 -Episode 1192 ended with a reward of 59.0, the average reward is 59.87 -Episode 1193 ended with a reward of 47.0, the average reward is 59.94 -Solved after 1194 episodes. The last episode ended with a reward of 73.0, the average reward was 60.07 -+++++++++++++ TRAINING STOPPED: WE REACHED THE REWARD THRESHOLD +++++++++++++ --
model.save("ppo_bowling_60.h5")
-
Let us now to reach 65.
- -target_reached = False
-reward_threshold = 65
-current_episode_reward = 0
-
-#train
-
Episode 1195 ended with a reward of 86.0, the average reward is 60.47 -Episode 1196 ended with a reward of 91.0, the average reward is 60.88 -Episode 1197 ended with a reward of 75.0, the average reward is 61.18 -Episode 1198 ended with a reward of 67.0, the average reward is 61.41 -Episode 1199 ended with a reward of 59.0, the average reward is 61.47 -Episode 1200 ended with a reward of 61.0, the average reward is 61.59 -++++++ We have been running for 3386752/40000000.0 frames ++++++ -Episode 1201 ended with a reward of 79.0, the average reward is 61.70 -Episode 1202 ended with a reward of 83.0, the average reward is 62.06 -Episode 1203 ended with a reward of 60.0, the average reward is 62.21 -Episode 1204 ended with a reward of 74.0, the average reward is 62.35 -Episode 1205 ended with a reward of 100.0, the average reward is 62.98 -Episode 1206 ended with a reward of 59.0, the average reward is 63.01 -Episode 1207 ended with a reward of 88.0, the average reward is 63.34 -Episode 1208 ended with a reward of 79.0, the average reward is 63.59 -Episode 1209 ended with a reward of 109.0, the average reward is 64.17 -Episode 1210 ended with a reward of 80.0, the average reward is 64.41 -Episode 1211 ended with a reward of 89.0, the average reward is 64.70 -Solved after 1212 episodes. The last episode ended with a reward of 86.0, the average reward was 65.10 -+++++++++++++ TRAINING STOPPED: WE REACHED THE REWARD THRESHOLD +++++++++++++ --
model.save("ppo_bowling_65.h5")
-
The agent is performing very well, we can now try to reach 70.
- -target_reached = False
-reward_threshold = 70
-current_episode_reward = 0
-
-#train
-
Episode 1213 ended with a reward of 79.0, the average reward is 65.32 -Episode 1214 ended with a reward of 94.0, the average reward is 65.70 -Episode 1215 ended with a reward of 85.0, the average reward is 66.05 -Episode 1216 ended with a reward of 73.0, the average reward is 66.28 -Episode 1217 ended with a reward of 74.0, the average reward is 66.56 -Episode 1218 ended with a reward of 106.0, the average reward is 67.16 -Episode 1219 ended with a reward of 71.0, the average reward is 67.34 -Episode 1220 ended with a reward of 88.0, the average reward is 67.67 -Episode 1221 ended with a reward of 86.0, the average reward is 68.14 -Episode 1222 ended with a reward of 89.0, the average reward is 68.38 -Episode 1223 ended with a reward of 68.0, the average reward is 68.39 -Episode 1224 ended with a reward of 82.0, the average reward is 68.66 -Episode 1225 ended with a reward of 95.0, the average reward is 69.09 -Episode 1226 ended with a reward of 77.0, the average reward is 69.22 -Episode 1227 ended with a reward of 70.0, the average reward is 69.24 -Episode 1228 ended with a reward of 82.0, the average reward is 69.57 -Episode 1229 ended with a reward of 66.0, the average reward is 69.58 -Episode 1230 ended with a reward of 72.0, the average reward is 69.68 -Episode 1231 ended with a reward of 95.0, the average reward is 69.91 -Solved after 1232 episodes. The last episode ended with a reward of 57.0, the average reward was 70.06 -+++++++++++++ TRAINING STOPPED: WE REACHED THE REWARD THRESHOLD +++++++++++++ --
model.save("ppo_bowling_70.h5")
-
The training was very quick thanks to the performance of the agent. Let us now try to reach 75.
- -target_reached = False
-reward_threshold = 75
-current_episode_reward = 0
-
-#train
-
Episode 1233 ended with a reward of 76.0, the average reward is 70.28 -Episode 1234 ended with a reward of 76.0, the average reward is 70.50 -Episode 1235 ended with a reward of 68.0, the average reward is 70.69 -Episode 1236 ended with a reward of 80.0, the average reward is 70.89 -Episode 1237 ended with a reward of 73.0, the average reward is 71.22 -Episode 1238 ended with a reward of 58.0, the average reward is 71.26 -Episode 1239 ended with a reward of 54.0, the average reward is 71.18 -Episode 1240 ended with a reward of 62.0, the average reward is 71.18 -Episode 1241 ended with a reward of 45.0, the average reward is 71.02 -Episode 1242 ended with a reward of 89.0, the average reward is 71.27 -Episode 1243 ended with a reward of 97.0, the average reward is 71.66 -Episode 1244 ended with a reward of 78.0, the average reward is 71.91 -Episode 1245 ended with a reward of 90.0, the average reward is 72.02 -Episode 1246 ended with a reward of 86.0, the average reward is 72.22 -Episode 1247 ended with a reward of 52.0, the average reward is 72.12 -Episode 1248 ended with a reward of 100.0, the average reward is 72.57 -Episode 1249 ended with a reward of 53.0, the average reward is 72.61 -Episode 1250 ended with a reward of 59.0, the average reward is 72.65 -Episode 1251 ended with a reward of 83.0, the average reward is 72.97 -Episode 1252 ended with a reward of 106.0, the average reward is 73.62 -Episode 1253 ended with a reward of 50.0, the average reward is 73.09 -Episode 1254 ended with a reward of 50.0, the average reward is 73.04 -Episode 1255 ended with a reward of 76.0, the average reward is 73.27 -Episode 1256 ended with a reward of 75.0, the average reward is 73.32 -Episode 1257 ended with a reward of 72.0, the average reward is 73.37 -Episode 1258 ended with a reward of 77.0, the average reward is 73.51 -Episode 1259 ended with a reward of 96.0, the average reward is 73.79 -Episode 1260 ended with a reward of 93.0, the average reward is 74.05 -Episode 1261 ended with a reward of 66.0, the average reward is 73.97 -Episode 1262 ended with a reward of 76.0, the average reward is 74.14 -Episode 1263 ended with a reward of 85.0, the average reward is 74.42 -Episode 1264 ended with a reward of 87.0, the average reward is 74.52 -Episode 1265 ended with a reward of 55.0, the average reward is 74.35 -Episode 1266 ended with a reward of 67.0, the average reward is 74.08 -Episode 1267 ended with a reward of 80.0, the average reward is 74.08 -Episode 1268 ended with a reward of 59.0, the average reward is 73.87 -Episode 1269 ended with a reward of 45.0, the average reward is 73.79 -Episode 1270 ended with a reward of 56.0, the average reward is 73.72 -Episode 1271 ended with a reward of 83.0, the average reward is 73.66 -Episode 1272 ended with a reward of 47.0, the average reward is 73.51 -Episode 1273 ended with a reward of 28.0, the average reward is 73.02 -Episode 1274 ended with a reward of 41.0, the average reward is 72.88 -Episode 1275 ended with a reward of 51.0, the average reward is 72.69 -Episode 1276 ended with a reward of 41.0, the average reward is 72.46 -Episode 1277 ended with a reward of 3.0, the average reward is 71.80 -Episode 1278 ended with a reward of 25.0, the average reward is 71.35 -Episode 1279 ended with a reward of 34.0, the average reward is 71.05 -Episode 1280 ended with a reward of 51.0, the average reward is 70.83 -Episode 1281 ended with a reward of 19.0, the average reward is 70.44 -Episode 1282 ended with a reward of 57.0, the average reward is 70.41 -Episode 1283 ended with a reward of 43.0, the average reward is 70.39 -Episode 1284 ended with a reward of 22.0, the average reward is 69.95 -Episode 1285 ended with a reward of 20.0, the average reward is 69.38 -Episode 1286 ended with a reward of 35.0, the average reward is 69.13 -Episode 1287 ended with a reward of 25.0, the average reward is 68.74 -Episode 1288 ended with a reward of 16.0, the average reward is 68.18 -Episode 1289 ended with a reward of 13.0, the average reward is 67.55 -Episode 1290 ended with a reward of 39.0, the average reward is 67.21 -Episode 1291 ended with a reward of 19.0, the average reward is 66.75 -Episode 1292 ended with a reward of 39.0, the average reward is 66.55 -Episode 1293 ended with a reward of 46.0, the average reward is 66.54 -Episode 1294 ended with a reward of 29.0, the average reward is 66.10 -Episode 1295 ended with a reward of 22.0, the average reward is 65.46 -Episode 1296 ended with a reward of 1.0, the average reward is 64.56 -Episode 1297 ended with a reward of 5.0, the average reward is 63.86 -Episode 1298 ended with a reward of 0.0, the average reward is 63.19 -Episode 1299 ended with a reward of 25.0, the average reward is 62.85 -Episode 1300 ended with a reward of 10.0, the average reward is 62.34 -++++++ We have been running for 3658112/40000000.0 frames ++++++ -Episode 1301 ended with a reward of 10.0, the average reward is 61.65 -Episode 1302 ended with a reward of 19.0, the average reward is 61.01 -Episode 1303 ended with a reward of 16.0, the average reward is 60.57 -Episode 1304 ended with a reward of 16.0, the average reward is 59.99 -Episode 1305 ended with a reward of 21.0, the average reward is 59.20 ----------------- TRAINING INTERRUPTED MANUALLY ---------------- --
Unfortunately, we have hit a wall. Since the training has gone on for almost two days and we had previously reached a good level of performance, we can be happy of the result and see how well the agent performs by making it play a few episodes loading the previous model.
- -model = keras.models.load_model('ppo_bowling_70.h5')
-
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually. --
def play_episode(id):
-
- # We need to have a different path for each run, or we
- # would need to pass `force = True`, but this causes
- # a few issues with the output on Colab
- env = gnwrapper.Monitor(get_environment(), f"./bowling_{id}")
- state = env.reset()
- total_reward = 0
- done = False
-
- while not done:
- t_state = tf.expand_dims(state, axis = 0)
- actor_value, _ = model(t_state)
- action_probability = np.squeeze(actor_value)
- action = np.random.choice(6, p = action_probability)
- state, reward, done, _ = env.step(action)
- total_reward += reward
-
- print("The episode ended with a total reward of ", total_reward)
- env.display()
-
for e in range(10):
- play_episode(e)
-
The episode ended with a total reward of 61.0 --
'openaigym.video.0.1673.video000000.mp4'-
The episode ended with a total reward of 83.0 --
'openaigym.video.1.1673.video000000.mp4'-
The episode ended with a total reward of 99.0 --
'openaigym.video.2.1673.video000000.mp4'-
The episode ended with a total reward of 76.0 --
'openaigym.video.3.1673.video000000.mp4'-
The episode ended with a total reward of 80.0 --
'openaigym.video.4.1673.video000000.mp4'-
The episode ended with a total reward of 82.0 --
'openaigym.video.5.1673.video000000.mp4'-
The episode ended with a total reward of 72.0 --
'openaigym.video.6.1673.video000000.mp4'-
The episode ended with a total reward of 78.0 --
'openaigym.video.7.1673.video000000.mp4'-
The episode ended with a total reward of 111.0 --
'openaigym.video.8.1673.video000000.mp4'-
The episode ended with a total reward of 82.0 --
'openaigym.video.9.1673.video000000.mp4'-
The agent plays pretty well, apart from certain instances where it does not even seem to try to hit the pins.
-In the future, it may be worth to try and modify the reward function so that it penalizes the agent for the number of pins left standing after each throw, so that it attempts to hit a strike every time. Additionally, it might be beneficial to end the episode after a strike or after the second throw. This could help reduce the learning curve and make the agent focus on every single throw.
- -This work was based on and inspired by:
-Inspirations taken from the official documentations for OpenAI Gym, Keras, Numpy, OpenCV, etc. are linked in the code or right before.
- -