From fa11def7751b6d79b48ab1509f365c1f74a092ea Mon Sep 17 00:00:00 2001 From: Tommaso Apicella Date: Sat, 26 Oct 2024 18:13:50 +0200 Subject: [PATCH] Update webpage.md --- zenodo/webpage.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/zenodo/webpage.md b/zenodo/webpage.md index 4a32b80..d04b024 100644 --- a/zenodo/webpage.md +++ b/zenodo/webpage.md @@ -8,19 +8,25 @@ Compared to previous works, models are trained under the same setup on two singl [[code](https://github.com/apicis/aff-seg/)] ## Release notes +26 October 2024: +* Upload weights of models trained on [UMD](https://ieeexplore.ieee.org/document/7139369) +* Model cards are in the zenodo folder of the [code repository](https://github.com/apicis/aff-seg/tree/main/zenodo/model_cards/umd) + 26 September 2024: * Upload weights of models trained on [CHOC-AFF](https://arxiv.org/abs/2308.11233) - +* Model cards are in the zenodo folder of the [code repository](https://github.com/apicis/aff-seg/tree/main/zenodo/model_cards/choc-aff) ## Available models Models trained on hand-occluded object setting using [CHOC-AFF](https://arxiv.org/abs/2308.11233): * [RN50-F](https://ieeexplore.ieee.org/document/9190733): RN50-F uses a ResNet-50 encoder with a pyramid scene parsing module to segment only the object affordances *graspable* and *contain*. * [ResNet18-UNet](https://arxiv.org/abs/1505.04597): UNet-like model that gradually down-sample feature maps in the encoder and up-sample them in the decoder, preserving the information via skip connections. * [ACANet](https://arxiv.org/abs/2308.11233): ACANet separately segments object and hand regions, using these masks to weigh the feature maps learnt in a third branch for the final affordance segmentation. We trained also a version of ACANet with ResNet-50. - + +Models trained on unoccluded object setting using [UMD](https://ieeexplore.ieee.org/document/7139369): +* [AffordanceNet](https://arxiv.org/abs/1709.07326): AffordanceNet is a two-stage method that detects the object and segments affordances. +* [CNN](https://ieeexplore.ieee.org/document/7759429): CNN is based on an encoder-decoder architecture to segment affordances. + +Models trained on both settings: * [DRNAtt](https://www.sciencedirect.com/science/article/pii/S0925231221000278): DRNAtt uses position and channel attention mechanisms in parallel after the feature extraction stage. The outputs of the attention modules are summed element-wise and the result is up-sampled through a learnable decoder. * [Mask2Former](https://arxiv.org/abs/2112.01527): Mask2Former is a recent hybrid architecture that combines an encoder-decoder convolutional neural network with a transformer decoder to decouple the classification of classes by the segmentation, tackling different types of segmentation, e.g., semantic, instance, and panoptic segmentation. Mask2Former introduced a masking operation in the cross-attention mechanism that combines the latent vectors with the features extracted from the image, ignoring the pixel positions outside the object region.