Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于ViT-L的DA-CLIP训练 #92

Open
lzl2040 opened this issue Dec 19, 2024 · 4 comments
Open

基于ViT-L的DA-CLIP训练 #92

lzl2040 opened this issue Dec 19, 2024 · 4 comments

Comments

@lzl2040
Copy link

lzl2040 commented Dec 19, 2024

作者您好,我在自己的数据集上进行训练,但是发现batch size最大只能设置为2,显卡是4090,这正常吗?
同时,训练完30个epoch后输出的日志如下:

image_to_text_mean_rank: 49.3377    image_to_text_median_rank: 46.0000      image_to_text_R@1: 0.0110       image_to_text_R@5: 0.0476        image_to_text_R@10: 0.0993      text_to_image_mean_rank: 49.0875        text_to_image_median_rank: 47.0000      text_to_image_R@1: 0.0110text_to_image_R@5: 0.0549       text_to_image_R@10: 0.1098      clip_val_loss: 0.6896   epoch: 30.0000  num_samples: 2458.0000

这是否有问题?

@Algolzw
Copy link
Owner

Algolzw commented Dec 19, 2024

Batch size为2应该不行。contrastive learning需要batch size越大越好,你可以测试一下呢。

@lzl2040
Copy link
Author

lzl2040 commented Dec 20, 2024

Batch size为2应该不行。contrastive learning需要batch size越大越好,你可以测试一下呢。

但是我在24G显存上最大也只能设置为4,但我看其他人可以设置到768,这正常吗,ViT-L和ViT-B差距不会这么大吧

@Algolzw
Copy link
Owner

Algolzw commented Dec 20, 2024

clip还有个text encoder,所以会比较大一些,你可以试一下vit-b32

@lzl2040
Copy link
Author

lzl2040 commented Dec 23, 2024

请问image_to_text_R和 text_to_image_R只有0.1左右正常吗,图片是2400多张,但是运行evaluate.py结果还行,图片跟文本能够对应上

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants