Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change size to 30 and agent is unable to reach the goal #2

Open
daniel-xion opened this issue Jun 16, 2022 · 5 comments
Open

Change size to 30 and agent is unable to reach the goal #2

daniel-xion opened this issue Jun 16, 2022 · 5 comments

Comments

@daniel-xion
Copy link

daniel-xion commented Jun 16, 2022

Hi, thanks for the code. I changed the grid size to 30 and it seems the agent is unable to learn to reach the goal:
image

It seems Q-learning is unable to handle large grids, and DQN is needed to solve it?

@qihongl
Copy link
Owner

qihongl commented Jun 16, 2022

thanks for the datapoint daniel!

I think the epoch number needs to be much much larger since the state space increased 36 times (30 x 30) / (5 x 5). q learning requires an agent with a random policy to bump into the target many times. This "target visitation probability" exponentially decreases as the state space gets larger.

@daniel-xion
Copy link
Author

Thanks for the prompt reply! Do you suggest, for large grids, and to be scalable (including reasonable training episodes), do you think DQN or other reinforcement learning will work?

@daniel-xion
Copy link
Author

daniel-xion commented Jun 23, 2022

FYI, I tried DQN from the following notebook and it is still not working. The agent keep bumping into the walls (even with the size of only 12):
DeepReinforcementLearning/DeepReinforcementLearningInAction#38

@qihongl
Copy link
Owner

qihongl commented Jun 24, 2022

Thanks for the datapoint -- that's really interesting. I'm not sure what's the next simplest thing to try...

@daniel-xion
Copy link
Author

No problem. By the way, what do you think of using transformer in the RL to solve the maze game? for example, by using the transformer to predict a sequence of actions that leads to a sequence of high rewards?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants