Segment & Classify-Anthing for Robotic Grasping

The DoUnseen package segments and classifies novel objects in just few lines of code. Without any training or fine-tuning.

Try it on HuggingFace Hugging Face

Usage modes

Standalone mode

Full Segmentation Pipeline (extension to Segment Anything and zero-shot segmentation models)

Installation

To use the full segmentation pipeline, you need to install any zero-shot segmentation model. Any zero-shot segmentation model can be used with DoUnseen.

We use Segment Anything 2 as an example. Note that SAM2 uses Python>= 3.10. This is not required for DoUnseen. Please install SAM2 it as follows:

git clone https://github.com/facebookresearch/sam2.git && cd sam2 && pip install -e .

Install DoUnseen:

pip install git+https://github.com/AnasIbrahim/image_agnostic_segmentation.git

Download the pretrained models from HuggingFace using git LFS:

cd image_agnostic_segmentation
git lfs install  # if not installed
git clone https://huggingface.co/anas-gouda/dounseen models/

How to use

First, Import dounseen and set up the classifier

from dounseen.core import UnseenClassifier
import dounseen.utils as dounseen_utils

unseen_classifier = dounseen.core.UnseenClassifier(
  gallery_images=None,  # Can be setup later using update_gallery()
  gallery_buffered_path=None,
  augment_gallery=False,
  batch_size=80,
)

1- Standalone mode

# Load query images of a single object as PIL images
from PIL import Image
query_images = [Image.open("object_1.jpg"), Image.open("object_2.jpg"), Image.open("object_3.jpg")]
# Update the gallery with the gallery images path
gallery_path = "PATH/TO/GALLERY"  # containing folders of objects object_xxx, object_yyy, object_zzz
unseen_classifier.update_gallery(gallery_path=gallery_path)
# To find which of the query images is the same as obj_xxx from the gallery
object_name, score = unseen_classifier.find_object(query_images, obj_name="obj_xxx", method="max")
# To find a match for all query objects from the gallery
class_predictions, class_scores = unseen_classifier.classify_all_objects(query_images, threshold=0.3, multi_instance=False)

2- Full segmentation pipeline (extension to Segment-Anything)

load SAM 2 and generate the masks.

torch.autocast("cuda", dtype=torch.bfloat16).__enter__()
# turn on tfloat32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
if torch.cuda.get_device_properties(0).major >= 8:
  torch.backends.cuda.matmul.allow_tf32 = True
  torch.backends.cudnn.allow_tf32 = True

# load SAM 2 from HuggingFace
sam2_mask_generator = SAM2AutomaticMaskGenerator.from_pretrained(
  'facebook/sam2-hiera-tiny',
  points_per_side=20,
  points_per_batch=20,
  pred_iou_thresh=0.7,
  stability_score_thresh=0.92,
  stability_score_offset=0.7,
  crop_n_layers=0,
  box_nms_thresh=0.7,
  multimask_output=False,
)

load and segment the image.

rgb_img = Image.open('/PATH/TO/IMAGE.jpg')
rgb_img = np.array(rgb_img.convert("RGB"))
sam2_output = sam2_mask_generator.generate(rgb_img)

prepare SAM 2 output for DoUnseen.

sam2_masks, sam2_bboxes = dounseen.utils.reformat_sam2_output(sam2_output)

If you want to remove the background segmentation masks, you can use the BackgroundFilter class. Most of the time, using this background filter is not necessary.

background_filter = dounseen.core.BackgroundFilter()
sam2_masks, sam2_bboxes = background_filter.filter_background_annotations(rgb_img, sam2_masks, sam2_bboxes)

Extract the query images using the masks.

segments = dounseen.utils.get_image_segments_from_binary_masks(rgb_img, sam2_masks, sam2_bboxes)

Update the gallery with the gallery images path

gallery_path = "PATH/TO/GALLERY"  # containing folders of objects object_xxx, object_yyy, object_zzz
unseen_classifier.update_gallery(gallery_path=gallery_path)

To search the image for a specific object (obj_xxx) from the gallery.

matched_query, score = unseen_classifier.find_object(segments, obj_name="obj_xxx", method="max")

To find all gallery objects

class_predictions, class_scores = unseen_classifier.classify_all_objects(segments, threshold=0.3, multi_instance=False)
filtered_class_predictions, filtered_masks, filtered_bboxes = dounseen.utils.remove_unmatched_query_segments(class_predictions, sam2_masks, sam2_bboxes)

For a full example please refer to the segment_image.py.

DoPose Dataset

The unseen object segmentation model used for the background filtering was trained using NVIDIA Falling-Things Dataset with our Dopose data. The DoPose dataset can be downloaded here. The dataset is saved in the BOP format. It includes multi-view of storage bin (KLT Euro container) and tabletop scenes. The annotations include RGB and depth images, 6D pose of each object, segmentation masks, COCO json annotations. Also the dataset includes camera transformations between different views of the same scene.

Samples from the dataset:

Papers and Citation

The latest version of DoUnseen is highly based on the our paper Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping [Arxiv] [IEEE]. The model used in DoUnseen is slightly under-trained compared to the model used in the paper.

@INPROCEEDINGS{10711720,
  author={Gouda, Anas and Schwarz, Max and Reining, Christopher and Behnke, Sven and Kirchheim, Alice},
  booktitle={2024 IEEE 20th International Conference on Automation Science and Engineering (CASE)},
  title={Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping},
  year={2024},
  volume={},
  number={},
  pages={3577-3583},
  keywords={Training;Location awareness;Image segmentation;Accuracy;Pipelines;Object detection;Object segmentation;Market research;Object recognition;Proposals},
  doi={10.1109/CASE59546.2024.10711720}}

A previous version of this repo was based on our original DoUnseen paper [Arvix]. The results presented in that paper were barely an improvement due to lack of datasets at that point of time.

@misc{gouda2023dounseen,
      title={DoUnseen: Tuning-Free Class-Adaptive Object Detection of Unseen Objects for Robotic Grasping}, 
      author={Anas Gouda and Moritz Roidl},
      year={2023},
      eprint={2304.02833},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Before zero-shot segmentation models like Segment-Anything came out. This repository offered a similar segmentation method that segmented only household objects. That was presented and trained using our DoPose dataset. [Arxiv] [IEEE]

@INPROCEEDINGS{10069586,
  author={Gouda, Anas and Ghanem, Abraham and Reining, Christopher},
  booktitle={2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)}, 
  title={DoPose-6D dataset for object segmentation and 6D pose estimation}, 
  year={2022},
  volume={},
  number={},
  pages={477-483},
  doi={10.1109/ICMLA55696.2022.00077}}

Latest updates

October 2024: the repo was strongly refactored to be more modular and easier to use

DoUnseen can be called using few lines of code
using SAMv2 for segmentation
Easy installation using pip
ROS support is removed
Grasp calculation is removed

Jan 18 2024:

New classification models were added using ViT and ResNet50 (paper to be added soon)
classification by calculating centroids of objects was added

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
demo		demo
dounseen		dounseen
images		images
LICENSE.md		LICENSE.md
README.md		README.md
segment_image.py		segment_image.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segment & Classify-Anthing for Robotic Grasping

Usage modes

Installation

How to use

For a full example please refer to the segment_image.py.

DoPose Dataset

Papers and Citation

Latest updates

This research is supported by the LAMARR institute

About

Releases

Packages

Languages

License

AnasIbrahim/image_agnostic_segmentation

Folders and files

Latest commit

History

Repository files navigation

Segment & Classify-Anthing for Robotic Grasping

Usage modes

Installation

How to use

For a full example please refer to the segment_image.py.

DoPose Dataset

Papers and Citation

Latest updates

This research is supported by the LAMARR institute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages