You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed a significant performance difference between MaskFormer and Mask2Former when training both models on my dataset for instance segmentation. The training code is identical except for the model-specific configurations. Below, I outline my setup, preprocessing steps, results, and relevant code to help pinpoint any potential issues.
Dataset
Task: Instance segmentation.
Format: The dataset is designed as follows:
The R channel contains the semantic class labels.
The G channel contains the instance IDs for each object.
Preprocessing
For both models, I used the following preprocessing configuration. The only difference lies in the model type and the specific pre-trained weights used.
For both MaskFormer and Mask2Former, I set:
do_reduce_labels=True
ignore_index=255
The purpose of do_reduce_labels=True is to ensure that class indices start from 0 and are incremented sequentially. This shifts class indices by -1, as shown in the Hugging Face [documentation](
As you can see, the performance gap between the two models is substantial, despite identical training setups and preprocessing pipelines. Mask2Former achieves significantly better performance in terms of mAP and other metrics, while MaskFormer struggles to achieve meaningful results.
Any insights or suggestions would be greatly appreciated. Thank you!
System Info
Description
I observed a significant performance difference between MaskFormer and Mask2Former when training both models on my dataset for instance segmentation. The training code is identical except for the model-specific configurations. Below, I outline my setup, preprocessing steps, results, and relevant code to help pinpoint any potential issues.
Dataset
Preprocessing
For both models, I used the following preprocessing configuration. The only difference lies in the model type and the specific pre-trained weights used.
For both MaskFormer and Mask2Former, I set:
do_reduce_labels=True
ignore_index=255
The purpose of
do_reduce_labels=True
is to ensure that class indices start from 0 and are incremented sequentially. This shifts class indices by-1
, as shown in the Hugging Face [documentation](transformers/examples/pytorch/instance-segmentation/run_instance_segmentation.py
Line 403 in 94af1c0
255
forignore_index
ensures that pixels labeled as background are ignored during loss computation.Results
Both models were trained for 20 epochs with the same hyperparameters:
5e-5
Test Image
Here are the results:
MaskFormer:
Test Image Result with MaskFormer:
Mask2Former:
Test Image Result with Mask2Former:
Observations
As you can see, the performance gap between the two models is substantial, despite identical training setups and preprocessing pipelines. Mask2Former achieves significantly better performance in terms of
mAP
and other metrics, while MaskFormer struggles to achieve meaningful results.Any insights or suggestions would be greatly appreciated. Thank you!
Who can help?
@amyeroberts, @qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Relevant Code
train_maskformer.py
train_mask2former.py
maskformer.py
mask2former.py
dataset_maskformer.py
dataset_mask2former.py
requirements.txt
Expected behavior
MaskFormer and Mask2Former should have similar results
The text was updated successfully, but these errors were encountered: