Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed in training fine-grained categorization dataset CUB-200-2011 #26

Open
JingyunLiang opened this issue Mar 31, 2018 · 1 comment
Open

Comments

@JingyunLiang
Copy link

I use the same triplet loss (with BatchHard, Euclidean distance and Soft-margin) on the fine-grained categorization dataset CUB-200-2011. It aims to distinguish different species of birds (200 categorization, 5994 image in all). I know fine-grained categorization is a kind of classification task. But I want to see if it is possible to treat it as an image retrieval problem (Or Person reid).

However, when I use VGG16 (pre-trained on ImageNet) to extract features for images, and train the whole model with your triplet loss. It does not converge. All images' activations of conv5_3 are negative values, and the activations become 0 after the following relu layer. It outputs same features (the last fc layer) for different images at last.

I follow your instructions but use another dataset. The loss drops at first and then keep at 0.7. Nonzero triplets never decreases.

@lucasb-eyer
Copy link
Member

Yes, that is very possible. As I write in the readme in big, bold letters:

💥 🔥 ❗ If you train on a very different dataset, don't forget to tune the learning-rate ❗ 🔥 💥

I don't write it for fun. I have trained our triplet model (with ResNet50) successfully on CUB-200-2011 (and CARS196 and Stanford Online Products, the usual three) reaching state-of-the-art, but only when adapting the learning-rate!

Actually, with Adam, even the "epsilon" can be an important hyper-parameter; this is also mentioned in the TensorFlow documentation of it. I recommend doing a grid-search for good hyper-parameters where the loss doesn't get stuck. (Do not choose by evaluating all on the test-set, that would be overfitting and cheating.) You can well do this overnight if you do only short runs of ~2000 or so updates.

The problem with CUB-200-2011 and CARS196 is that they are tiny, I find it a pity that they are still used in new publications 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants