Back | Next | Contents
Object Detection
The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability we're highlighting in this tutorial is detecting objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.
The detectNet
object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. To train the object detection model, first a pretrained ImageNet recognition model (like Googlenet) is used with bounding coordinate labels included in the training dataset in addition to the source imagery.
The following pretrained DetectNet models are included with the tutorial:
- ped-100 (single-class pedestrian detector)
- multiped-500 (multi-class pedestrian + baggage detector)
- facenet-120 (single-class facial recognition detector)
- coco-airplane (MS COCO airplane class)
- coco-bottle (MS COCO bottle class)
- coco-chair (MS COCO chair class)
- coco-dog (MS COCO dog class)
As with the previous examples, provided are a console program and a camera streaming program for using detectNet.
The detectnet-console
program can be used to find objects in images. To load one of the pretrained object detection models that comes with the repo, you can specify the pretrained model name as the 3rd argument to detectnet-console
:
$ ./detectnet-console dog_1.jpg output_1.jpg coco-dog
The above command will process dog_1.jpg, saving it to output_1.jpg, using the pretrained DetectNet-COCO-Dog model. This is a shortcut of sorts so you don't need to train the model yourself if you don't want to.
Below is a table of the pretrained DetectNet snapshots downloaded with the repo (located in the data/networks
directory after running cmake
step) and the associated argument to detectnet-console
used for loading the pretrained model:
DIGITS model | CLI argument | classes |
---|---|---|
DetectNet-COCO-Airplane | coco-airplane |
airplanes |
DetectNet-COCO-Bottle | coco-bottle |
bottles |
DetectNet-COCO-Chair | coco-chair |
chairs |
DetectNet-COCO-Dog | coco-dog |
dogs |
ped-100 | pednet |
pedestrians |
multiped-500 | multiped |
pedestrians, luggage |
facenet-120 | facenet |
faces |
These all also have the python layer patch above already applied.
Let's try running some of the other COCO models. The training data for these are all included in the dataset downloaded above. Although the DIGITS training example above was for the coco-dog model, the same procedure can be followed to train DetectNet on the other classes included in the sample COCO dataset.
$ ./detectnet-console bottle_0.jpg output_2.jpg coco-bottle
$ ./detectnet-console airplane_0.jpg output_3.jpg coco-airplane
Included in the repo are also DetectNet models pretrained to detect humans. The pednet
and multiped
models recognized pedestrians while facenet
recognizes faces (from FDDB). Here's an example of detecting multiple humans simultaneously in a crowded space:
$ ./detectnet-console peds-004.jpg output-4.jpg multiped
When using the multiped model (PEDNET_MULTI
), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.
$ ./detectnet-console peds-003.jpg output-3.jpg multiped
Next, we'll run object detection on a live camera stream.
Next | Running the Live Camera Detection Demo
Back | Running the Live Camera Recognition Demo
© 2016-2019 NVIDIA | Table of Contents