Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I would like to know the text input capacity of eva-clip-18b? To my knowledge, OpenAI's CLIP requires less than 20 tokens / 我想了解一下 eva-clip-18b 文本输入容量是多少?据我了解,OpenAI 的 CLIP 的话低于20 个 token #165

Open
gg22mm opened this issue Aug 16, 2024 · 0 comments

Comments

@gg22mm
Copy link

gg22mm commented Aug 16, 2024

I would like to know the text input capacity of eva-clip-18b? To my knowledge, OpenAI's CLIP requires less than 20 tokens
OpenAI's CLIP has two major shortcomings:

  1. The text input capacity is very limited. At most, it only supports input of 77 tokens. According to LongCLIP's experiment, its effective input does not exceed 20 tokens.
  2. Poor performance in pure text retrieval. There are two main reasons: firstly, the training objective of the CLIP model is to align text and images, without specialized optimization for pure text retrieval. Secondly, the training data for the CLIP model mainly consists of relatively short texts, making it difficult to generalize to broader text retrieval scenarios.
    I don't know if eva-clip-18b has any restrictions like openia-clip for text retrieval?

我想了解一下 eva-clip-18b 文本输入容量是多少?据我了解,OpenAI 的 CLIP 的话低于20 个 token

OpenAI 的 CLIP 存在两大短板:

  1. 文本输入容量非常有限。最多仅支持 77 个 token 的输入,根据 LongCLIP 的实验,实际上其有效输入不超过 20 个 token。

  2. 在纯文本检索中表现不佳。主要原因有两点:首先,CLIP 模型的训练目标是对齐文本和图像,没有针对纯文本检索进行专门优化。其次,CLIP 模型的训练数据主要由相对较短的文本组成,难以泛化到更广阔的文本检索场景。

不知道 eva-clip-18b做为文本检索有没有上面象openia-clip的限制呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant