You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Professor Luo, I am a beginner and I want to reproduce these models on my own dataset. My dataset is a simple image captioning dataset, and I want to extract attention (att) features for further training. How can I implement feature extraction? Should I directly use the features extracted by Faster R-CNN or do I need to retrain it on my own dataset (even though my dataset is not an object detection dataset)
The text was updated successfully, but these errors were encountered:
Hi, first of all, if you want to extract features, the dataset does not have to be an object detection dataset.
Second, if you would like to train your model from scratch, you can use any feature extraction. My suggestion may be CLIP at this point. You just need to convert the features into similar format as used in the codebase, and then you can run the training.
Thirdly, if you would like to use the pretrained models I provide, I suggest you use the 12-in-1 one (see data/README.md) because it uses pytorch to extract features. The bottom up uses caffe and I don't know if it is still easy to run that.
Professor Luo, I am a beginner and I want to reproduce these models on my own dataset. My dataset is a simple image captioning dataset, and I want to extract attention (att) features for further training. How can I implement feature extraction? Should I directly use the features extracted by Faster R-CNN or do I need to retrain it on my own dataset (even though my dataset is not an object detection dataset)
The text was updated successfully, but these errors were encountered: