Pytorch implementation of deepmind paper [Hybrid computing using a neural network with dynamic external memory]: https://pdfs.semanticscholar.org/7635/78fa9003f6c0f735bc3250fc2116f6100463.pdf. The code is based on the tensorflow implementation [here]: https://github.com/deepmind/dnc.
Todo finish retraining, and better writeup.
There are a few tasks setup. One is the "Air Cargo Prolbem" from Arificial Intelligence (Russell & Norvig). The origional code for the problem is based on the [Udacity Implementation]: https://github.com/udacity/AIND-Planning , and the full description is in the problem repo.
The Air Cargo problem can be seen as structured prediction, (every prediction step can be seen as changing the state of the problem). The algorithms used to solve it in the book included graphplan, and Astar search of the state space, putting it in the same family of problems as the Blocks Problem (SHRDLU) solved in the origional paper.
Requires pytorch (no cuda)
conda install pytorch torchvision cuda80 -c soumith
Run tensorboard with "--log 10" flag (the number is logging frequency). Below is a shot during training. Losses are recorded seperately for each entity and type, as well as the action for more addictive monitoring.
To run with tensorboard: pip install tensorboardX (for tensorboard) pip install tensorflow (for tensorboard web server)
The code implements a training schedule as in the paper. Start small with the minimum sized problem (2 entities of each kind)
python run.py --act plan --iters 1000 --ret_graph 1 --zero_at step --n_phases 20 --opt_at step
python run.py --act plan --iters 1000 --ret_graph 1 --zero_at step --n_phases 20 --opt_at step --save opt_zero_step
python run.py --act plan --iters 1000 --ret_graph 0 --opt_at problem --save opt_problem_plan --n_phases 20
We humans would think about the problem in terms of actions and type, so I thought the first thing the DNC would start getting correct would be the (Action, typeofthing1, typeofthing2, typeofthing3) 'tuple', since those must be correct in order to reliably get the instance correct. This was indeed the case as can be seen on the 'accuracies' plots during training. By the scemantics of the problem, the last 'type' is always Airplane, so that goes to 100% accuracy immediately. The next chunk of 1/3td of the training bumps up the types to 0.9-1.0 range. Only then does the loss for the entities themselves start dropping consistently. Even then, the ent1 and ent3 were coupled, which in the logic of the problem...
To show details at each step of what was predicted vs best moves, specify the --detail flag. You will get something like this:
trial 978, step 19514 trial accy: 6/7, 0.86, pass total 296/978, running avg 0.7463, loss 0.0774
best Load ['C1', 'P1', 'A1'], Fly ['P1', 'A1', 'A0']
chosen: Load ['C1', 'P1', 'A1'], guided True, prob 0.25, T? True ---loss 0.2553
best Fly ['P1', 'A1', 'A0'], Unload ['C1', 'P1', 'A0']
chosen: Fly ['P1', 'A1', 'A0'], guided True, prob 0.33, T? True ---loss 0.0784
best Unload ['C1', 'P1', 'A0'], Fly ['P0', 'A0', 'A1']
chosen: Unload ['C1', 'P1', 'A0'], guided True, prob 0.33, T? True ---loss 0.0830
best Fly ['P0', 'A0', 'A1'], Load ['C0', 'P0', 'A1']
chosen: Fly ['C0', 'P0', 'A1'], guided True, prob 0.25, T? False ---loss 0.3716
best Load ['C0', 'P0', 'A1'], Fly ['P0', 'A1', 'A0']
chosen: Load ['C0', 'P0', 'A1'], guided True, prob 0.25, T? True ---loss 0.1288
best Fly ['P0', 'A1', 'A0'], Unload ['C0', 'P0', 'A0']
chosen: Fly ['P0', 'A1', 'A0'], guided False, prob 0.25, T? True ---loss 1.1554
best Fly ['P0', 'A1', 'A0'], Unload ['C0', 'P0', 'A0']
chosen: Unload ['C0', 'P1', 'A0'], guided True, prob 0.33, T? False ---loss 0.9901
best Unload ['C0', 'P0', 'A0']
chosen: Unload ['C0', 'P0', 'A0'], guided False, prob 0.33, T? True ---loss 0.8087
best Fly ['P0', 'A1', 'A0'], Unload ['C0', 'P0', 'A0']
chosen: Unload ['C0', 'P0', 'A0'], guided True, prob 0.33, T? True ---loss 0.7677
The best actions are what was deterimined by the problem heurstics (not always optimal to save time). The chosen action is what the DNC ended up chosing. 'Guided' refers to Beta from the paper. 'Prob' is the chance of chosing that action (out of all legal actions), and the loss is there as well.
Another task that would be interesting I figured would be to give the DNC a problem (initial state, and goal), then make some moves, and ask where a certain Cargo is (which airport is it in? is it in a plane? which plane?). This did not work too well. See the run.py train_qa function.
python run.py --act qa --iters 1000 --n_phases 20
In an initial pass I tested with the sequence Memorization task from the deepmind repo. I have not tested it recently and I doubt it works (see todo). To run this specify the wit the problem
The DNC was tested against vanilla Lstms. The Lstm appears to get stuck on air cargo problem at ~40%. To run the training with LSTM only specify with '--algo LSTM' flag like so:
python run.py --act plan --algo lstm --iters 1000 --n_phases 20
Training at each 'level' took 20K steps. This is way more than reported in the paper. On my crappy home CPU, this meant about a day, aka forever. Since I also lost my computer, causing me to need to retrain everything, I only got through the first level of training before having to submit (2 airports, 2 cargos, 2 planes).
There was some expirementation here, so there are a bunch of flags on when to optimize. In the paper they calculated loss at end of each problem. This did not work for me, so I ended up with running the optimzer after each response.
python run.py --act plan --iters 1000 --n_phases 20 --load the_saved_name_or_path --save the_new
Set the --env flag to floyd. When it gets up there, the script will create all the directories in /output. Tensorboard for pytorch does not appear to work on there for reasons I do not understand.
floyd run --env pytorch-0.2 --tensorboard "bash setup.sh && python run.py --act dag --iters 1000 --env floyd"
Upload best models Test the sequence memorization task. probably does not work.
Implement with GPU.
Faster problem generator
fix tensorboard issues
gradient clipping
visualization of what dnc is doing internally (per paper)
penalty for bad actions when not using the beta coefficient for forcing
losses by prediction (fast loss)
run whole lstm on input and goal state?
Document args in argparse
Testing on moar problems.