基于ViT-L的DA-CLIP训练 #92

lzl2040 · 2024-12-19T02:43:58Z

作者您好，我在自己的数据集上进行训练，但是发现batch size最大只能设置为2，显卡是4090，这正常吗？
同时，训练完30个epoch后输出的日志如下：

image_to_text_mean_rank: 49.3377    image_to_text_median_rank: 46.0000      image_to_text_R@1: 0.0110       image_to_text_R@5: 0.0476        image_to_text_R@10: 0.0993      text_to_image_mean_rank: 49.0875        text_to_image_median_rank: 47.0000      text_to_image_R@1: 0.0110text_to_image_R@5: 0.0549       text_to_image_R@10: 0.1098      clip_val_loss: 0.6896   epoch: 30.0000  num_samples: 2458.0000

这是否有问题？

The text was updated successfully, but these errors were encountered:

Algolzw · 2024-12-19T11:49:00Z

Batch size为2应该不行。contrastive learning需要batch size越大越好，你可以测试一下呢。

lzl2040 · 2024-12-20T06:52:45Z

Batch size为2应该不行。contrastive learning需要batch size越大越好，你可以测试一下呢。

但是我在24G显存上最大也只能设置为4，但我看其他人可以设置到768，这正常吗，ViT-L和ViT-B差距不会这么大吧

Algolzw · 2024-12-20T22:59:03Z

clip还有个text encoder，所以会比较大一些，你可以试一下vit-b32

lzl2040 · 2024-12-23T04:02:20Z

请问image_to_text_R和 text_to_image_R只有0.1左右正常吗，图片是2400多张，但是运行evaluate.py结果还行，图片跟文本能够对应上

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

基于ViT-L的DA-CLIP训练 #92

基于ViT-L的DA-CLIP训练 #92

lzl2040 commented Dec 19, 2024

Algolzw commented Dec 19, 2024

lzl2040 commented Dec 20, 2024

Algolzw commented Dec 20, 2024

lzl2040 commented Dec 23, 2024

基于ViT-L的DA-CLIP训练 #92

基于ViT-L的DA-CLIP训练 #92

Comments

lzl2040 commented Dec 19, 2024

Algolzw commented Dec 19, 2024

lzl2040 commented Dec 20, 2024

Algolzw commented Dec 20, 2024

lzl2040 commented Dec 23, 2024