Skip to content

Latest commit

 

History

History
94 lines (70 loc) · 4 KB

README.md

File metadata and controls

94 lines (70 loc) · 4 KB

Container Localisation and Mass Estimation with an RGB-D Camera

This repository contains the method proposed by Visual team in CORSMAL challenge (Task 4) and accepted at ICASSP 2022 conference.

The estimation of the empty container mass exploits RGB-D data coming from a fixed frontal view, using a two-stage pipeline. The first stage employs a detection and segmentation network to locate the container. The second part uses a simple and lightweight encoder to provide the actual mass estimation.

A brief description of the method:

  1. For each video, every frame is sampled and the object detection and segmentation is performed using Mask R-CNN model pretrained on COCO.
  2. Leveraging the average distance, computed considering the depth map only in the pixels positions belonging to the segmentation mask, we select the 5 nearest objects (least average distance with respect to the camera of the chosen view).
  3. The final prediction of the container mass is the average of the 5 predictions (one per each nearest detected object) performed by a lightweight CNN encoder model.

[arXiv] [CCM dataset]

Table of contents

Installation

Setup specifics

  • OS: Ubuntu 20.04.3 LTS
  • Kernel version: 5.11.0-46-generic
  • CPU: Intel® Core™ i9-9900 CPU @ 3.10GHz
  • Cores: 16
  • RAM: 32 GB
  • GPU: NVIDIA GeForce RTX 2080 Ti

Requirements

The name of the main libraries and their versions are reported in the following list:

  • python=3.8
  • pytorch=1.10.1
  • torchvision=0.11.2
  • scipy=1.7.3
  • matplotlib=3.5.1
  • torchsummary=1.5.1
  • pandas=1.3.5
  • opencv=4.5.2
  • tqdm=4.62.3

The file requirements.txt reports all libraries and their versions. To install them the following code snippet can be used:

# Create conda environment
conda create --name CCM python=3.8 # or conda create -n CCM python=3.8
conda activate CCM

# Install libraries
pip install torch torchvision scipy matplotlib torchsummary pandas tqdm
conda install -c conda-forge opencv

Instructions

  1. Clone the repository
  2. Install the requirements
  3. Run demo/generate_video_inference.py passing as arguments the path to the directory of RGB (.mp4) videos and depth files (.png).

Demo

demo/generate_video_inference.py runs the demo of the proposed method and creates the submission .csv file.

Running arguments

The running arguments of the python demo are:

  • path_to_video_dir: path to the directory containing RGB .mp4 videos
  • path_to_dpt_dir: path to the directory containing the depth folders containing .png images. The name of the video must match the name of the depth folder. These arguments are loaded as strings, hence the inverted commas must be used e.g. "home/dataset/rgb_images".

Running examples

# Run demo
python demo/generate_video_inference.py --path_to_video_dir <PATH_TO_VIDEO_DIR> --path_to_dpt_dir <PATH_TO_DPT_DIR>  

Data format

Input

The proposed method uses both RGB and depth images, in particular: the detection/segmentation model uses RGB, the 5 candidate selection uses RGB, masks, depth, and the encoder model uses RGB images.

Output

The output of final stage of the encoder is a float value in range [0, 1]. In the demo is shown how to provide the output in the appropriate range.

Contacts

If you have any further enquiries, question, or comments, please contact [email protected] or [email protected]. If you would like to file a bug report or a feature request, use the Github issue tracker.

License

This work is licensed under the MIT License. To view a copy of this license, see LICENSE.