This is a header-only library that implements a modified version of the GrabCut algorithm. The problem it solves is the two-class image segmentation (foreground/background); in other words, it detects a salient object in an RGB-D image. It follows the logic described in the DenseCut paper by Cheng et al[1], adapting it to GPU.
In short, the algorithm performs the following for every frame:
- Get the color and depth buffers from a depth camera.
- Assuming the salient object is in front, label the image pixels based on a simple treshold and the depth buffer (i.e. just like in the librealsense-grabcuts example).
- Fit two Gaussian Mixture Models (GMM) onto the color frame to create the color models of background and foreground.
- Use the trained models to label the image.
- Use a Conditional Random Field model (CRF) to refine the labels.
Steps 2-5 are performed entirely on GPU, which allowed me to run the algorithm at steady 30 FPS.
The gmm.cuh
module is a generic CUDA implementation of the
GMM.
It can fit M
GMMs, K
components each, on a single image at once.
Thus, this module alone can be used for realtime M
-class image segmentation.
The module uses the standard EM-algorithm for estimation and Cholesky decomposition for computing the covariance inverse and determinant.
The crf.cuh
module is an adaptation of the GPU implementation of CRF by
Jiahui Huang,
who used Miguel Monteiro's
implementation of fast gaussian filtering.
The theory for this implementation can be found in [2] and [3].
The examples make use of the core real-salient
as well as of couple VR-related tricks.
They are hardcoded to use the Intel RealSense D415 camera and its SDK to
capture a color+depth video stream (examples/vr-salient/include/cameraD415.hpp
).
VR bounds:
The examples use OpenVR to improve the initial guess
of the salient object position (step 2 in the algorithm above).
I attach an extra tracker to the depth camera to locate its position in VR.
This allows me to find the position of the headset and hand controllers on the image via a simple coordinate transform.
This is implemented in examples/vr-salient/include/vrbounds.hpp
.
VR depth stencil:
In addition to the tracker positions, I employ a Vulkan+OpenVR
combination to render the VR shaperone bounds
into a temporary buffer.
This allows me to cut-off all objects outside the user-defined play area from the scene.
This is implemented in examples/vr-salient/include/vulkanheadless.hpp
.
vr-salient
is a standalone program.
In addition to the tweaks above, it uses OpenCV highgui library - only to display the window.
The VR tricks are optional in this example.
saber-salient
is a dynamic library to be used in my BeatSaber plugin.
It functions the same as vr-salient
, but does not require OpenCV
and requires VR tracking.
[1] [pdf] Cheng, M.M., Prisacariu, V.A., Zheng, S., Torr, P.H.S. and Rother, C. DenseCut: Densely Connected CRFs for Realtime GrabCut. Computer Graphics Forum, 34: 193-201. 2015.
[2] Krähenbühl, Philipp, and Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems. 2011.
[3] Adams, Andrew, Jongmin Baek, and Myers Abraham Davis. Fast high‐dimensional filtering using the permutohedral lattice. Computer Graphics Forum. Vol. 29. No. 2. Oxford, UK: Blackwell Publishing Ltd, 2010.