This is an initial attempt at implementing Fast-RCNN in Tensorflow. It's still a work in progress - the ROI pooling op is pretty much finished, but the full Fast-RCNN demo is still being completed.
The challenge in doing this (as of when it was started, in May/June 2016, at least) is that Tensorflow doesn't have an implementation of the ROI pooling operation which does a good bit of the magic in Fast-RCNN (and Faster-RCNN). Without this op, it's pretty much impossible to use these algorithms in Tensorflow.
In this fork of Tensorflow, I've implemented the ROI pooling layer for CPU and GPU, and also the gradient operation for both (also CPU and GPU). The code for the forward op is here. The gradient op is here. The GPU implementations of both are here.
I've also created some demos and test scripts, using Ipython/Jupyter. An overall demo of the ROI pooling layer in operation is here. A unit test of the forward op is here, and of the gradient op here. A very quick-and-dirty performance comparison between the CPU and GPU implementations can be viewed here.
Installation is the same as normal Tensorflow - follow the instructions for building from source (but clone this repository instead of the main Tensorflow one). Build the ROI pooling op using the 'build_user_op.sh' script provided.
A demo of the whole Fast-RCNN network is coming soon. For now, you can see an initial version of it here. It has some dependencies on other of my helper libraries that I'm working to remove. The current demo is also not a full R-CNN network - I've simplified it a bit to show just the novel parts (the roi pooling layer).