Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train/enjoy_husky_gibson_flagrun.py issues! #104

Open
Berk035 opened this issue Dec 10, 2019 · 4 comments
Open

train/enjoy_husky_gibson_flagrun.py issues! #104

Berk035 opened this issue Dec 10, 2019 · 4 comments

Comments

@Berk035
Copy link

Berk035 commented Dec 10, 2019

Hello everyone,

I have been studying about husky flagrun algorithms for a long time. I have some problems about it. Despite of trying everyhing, agent can not able to learn how to go to cube(target).

  • First of all, I couldn't understand the reward function which contains alive_score,progress and obstacle_dist only. There is no any close_to_target option to go target.

  • Second thing, The target location does not change in any file. There is only two line in _flagreposition as self.walk_to_target = ballxyz. It seems that not contribute to reward function and learning process.

  • The last thing, there is a sentence in the paper: "We trained a perceptual and non-perceptual husky agent according to the setting in Sec. 4.1 with PPO [78] for 150 episodes (300 iterations, 150k frames)." Is the true calculation 150k frame/300 Iteration = 500 Timesteps*Batch ? Timesteps and batch multiplication seems too low.

If I took answers to questions, I would be grateful to you. Thanks.

@fxia22
Copy link
Collaborator

fxia22 commented Dec 10, 2019

  • Reward: alive_score is a reward function to prevent agent from tipping over; progress is the difference of the potential function for two consecutive timesteps (dense reward); obstacle distance penalize going too close to an obstacle.

  • The target location is changed in _flag_reposition(), in that function a random force is applied to the red cube and throws it within the room, in this way the target location is changed.

  • The policy is able to converge with a small number of environment steps because it receives ground truth localization, i.e. the agent knows where the target is and only needs to perform local planning/obstacle avoidance.

@fxia22
Copy link
Collaborator

fxia22 commented Dec 10, 2019

Can you plot your reward curve during your training process? This would be insightful! Thanks.

@Berk035
Copy link
Author

Berk035 commented Dec 10, 2019

Can you plot your reward curve during your training process? This would be insightful! Thanks.

Thank you for your quick response Fei. You are awesome :)
I know that the rewards but according to the enjoy results, the agent couldn't go the target. I tried also training with adding self.robot.set_target_position(ball_xyz). Anyway, I will plot my results a few minutes later. Thank you.

@Berk035
Copy link
Author

Berk035 commented Dec 10, 2019

Figure_1

Timesteps:600, Episode:20, Iterations:250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants