Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounding box prediction using other backbones #20

Open
carlosuc3m opened this issue Jun 19, 2024 · 1 comment
Open

Bounding box prediction using other backbones #20

carlosuc3m opened this issue Jun 19, 2024 · 1 comment

Comments

@carlosuc3m
Copy link

carlosuc3m commented Jun 19, 2024

Hello, first of all I would like to congratulate you an an awesome work.

I am developing a series of Java based plugins for different Java softwares (Fiji, ImageJ, Icy) that use lighter variants of SAM to improve the manual annotation.

We also want to include automatic segmentation so we thought that providing CellSAM would be the best option, because the implementation of SAM Everything does not really work on cells.

We want to use lighter variants of SAM because we want the plugins to run on any computer and the faster the better.

These are the models that we are using:

From what I have seen you trained the AnchorDetr model with the SAM-b ViT encoder. SAM b has a different number of features than the models that we use. Do you know of any way to adapt your AnchorDetr to these models. I am thinking about interpolation the output feature maps.

If not, for how long did you train the model? And on how many GPUs?

Regards,
Carlos

@carlosuc3m
Copy link
Author

Also after reading the paper and training used I wonder why did you need to train the ViT encoder for the CellFinder model. Weren't the SAM ViT feature maps good enough to feed them to the decoder?
Have you tried with SAM-h?

Maybe the need to retrain it is because SAM-b feature maps are not good enough. This is where EfficientSAM or EfficientViTSAM are interesting, because their performance is quite good compared even to SAM-h, so maybe their ViT encoder can be frozen during the CellFinder step.

Sorry if any of these questions are stupid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant