Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Low priority] Building off of deep q learning example #18

Open
ckcollab opened this issue Oct 14, 2015 · 2 comments
Open

[Low priority] Building off of deep q learning example #18

ckcollab opened this issue Oct 14, 2015 · 2 comments

Comments

@ckcollab
Copy link

Hey there! First of all: thank you so much for releasing this, documenting things, putting it on pypi, etc. etc., really appreciate it :)

I've been trying to get a fun "search and rescue" example working where a drone with a search radius explores a map until it finds the objective. Right now I am having trouble getting the state input properly... assuming I have the rest understood. It seems like I should keep doing DQNAgent.learn over and over until I am satisfied? Was kind of confused by the DQNA example with all the socketIO stuff, wasn't sure how that was driving the learning.

# some psuedo code
height = width = 40
map = numpy array[40, 40]
ACTIONS = ('up', 'down', 'left', 'right')

agent = DQNAgent(height * width, len(self.ACTIONS))

while True:
    state = map.copy()
    action = self.agent.get_action(state)
    reward = 0 # not sure what to set this on the first "learn"

    drone.do_action(ACTIONS[action])

    # Get state after action has changed it
    next_state = map.copy()

    reward = drone.get_current_reward()

    self.agent.learn(state, action, reward, next_state)

However the problem is I get:
image

Wrong number of dimensions: expected 2, got 3 with shape (1, 40, 40).

If you're feeling crazy here's the actual source.

I may have this all ass backwards, apologies if this is a silly question.

@zomux
Copy link
Owner

zomux commented Oct 21, 2015

Okay, I will check it.

@ckcollab
Copy link
Author

ckcollab commented Nov 8, 2015

Hey there, to summarize my direct problem: what kind of "state" should i be using? I am trying array with shape (40, 40) so I had to modify the example, changing:

action = self.model.compute([state])   # gives us (1, 40, 40) when we want (40, 40) state

to:

action = self.model.compute(state)  # gives us (40, 40)

And now I get:

ValueError: Shape mismatch: x has 40 cols (and 40 rows) but y has 1600 rows (and 100 cols)
Apply node that caused the error: Dot22(x, W_dense1)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(40, 40), (1600, 100)]
Inputs strides: [(320, 8), (800, 8)]
Inputs values: ['not shown', 'not shown']

Maybe I'm doing something else wrong and I don't want to poke around too much in the deepy codebase--but how should I be setting up the state properly?


Full output:

$ python src/run.py deepq
Starting experiment...
  state_num = 1600
> /Users/eric/src/plithos/src/plithos/deep_q_learner.py(57)get_action()
     56                 import ipdb; ipdb.set_trace()
---> 57                 action = self.model.compute(state)
     58             return int(action[0].argmax())

ipdb> c
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
src/run.py in <module>()
     48         drone_count=args.drone_count,
     49     )
---> 50     experiment.start()
     51 
     52 

/Users/eric/src/plithos/src/plithos/simulations/dqn_single_drone.py in start(self)
     21 
     22             state = self.map.copy()
---> 23             action = self.agent.get_action(state)
     24             reward = 0
     25 

/Users/eric/src/plithos/src/plithos/deep_q_learner.pyc in get_action(self, state)
     55             with self.thread_lock:
     56                 import ipdb; ipdb.set_trace()
---> 57                 action = self.model.compute(state)
     58             return int(action[0].argmax())
     59 

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/deepy/networks/network.pyc in compute(self, *x)
    143         """
    144         self._compile()
--> 145         return self._compute(*x)
    146 
    147     @property

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    604                         self.fn.nodes[self.fn.position_of_error],
    605                         self.fn.thunks[self.fn.position_of_error],
--> 606                         storage_map=self.fn.storage_map)
    607                 else:
    608                     # For the c linker We don't have access from

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    593         t0_fn = time.time()
    594         try:
--> 595             outputs = self.fn()
    596         except Exception:
    597             if hasattr(self.fn, 'position_of_error'):

ValueError: Shape mismatch: x has 40 cols (and 40 rows) but y has 1600 rows (and 100 cols)
Apply node that caused the error: Dot22(x, W_dense1)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(40, 40), (1600, 100)]
Inputs strides: [(320, 8), (800, 8)]
Inputs values: ['not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants