Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mieb] InfoSeekIT2ITRetrieval & InfoSeekIT2TRetrieval fail with BAAI/bge-visualized #1386

Open
Muennighoff opened this issue Nov 4, 2024 · 2 comments
Labels
mieb The image extension of MTEB

Comments

@Muennighoff
Copy link
Contributor

Muennighoff commented Nov 4, 2024

Running the tasks with BAAI/bge-visualized-base-base/m3 and getting errors like below

ERROR:mteb.evaluation.MTEB:Error while evaluating InfoSeekIT2TRetrieval: The size of tensor a (516) must matc
h the size of tensor b (512) at non-singleton dimension 1                                                    
Traceback (most recent call last):                                                                           
  File "/data/niklas/mieb/mteb/scripts/run_mieb.py", line 82, in <module>
    results = evaluation.run(model, output_folder="/data/niklas/mieb/results-mieb-final", batch_size=1)      
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 464, in run
    raise e                                                                                                  
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 425, in run                                    
    results, tick, tock = self._run_eval(                                                                    
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 300, in _run_eval
    results = task.evaluate(                                                                                 
  File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py", line 269, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py", line 278, in _evaluate_subset
    results = retriever(corpus, queries)                                                                     
  File "/data/niklas/mieb/mteb/mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py", line 290, in _
_call__                                                                                                      
    return self.retriever.search(
  File "/data/niklas/mieb/mteb/mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py", line 173, in s
earch                                                                                                        
    sub_corpus_embeddings = self.model.get_text_embeddings(              
  File "/data/niklas/mieb/mteb/mteb/models/vista_models.py", line 130, in get_text_embeddings
    batch_embeddings = self.encode(texts=batch_texts)
  File "/data/niklas/mieb/mteb/mteb/models/vista_models.py", line 121, in encode
    return self.encode_text(texts.to(self.device))
  File "/data/niklas/mieb/mteb/mteb/models/vista_models.py", line 65, in encode_text
    embedding_output = self.bge_embeddings(
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrap
ped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call
_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line
 217, in forward
    embeddings += position_embeddings
RuntimeError: The size of tensor a (516) must match the size of tensor b (512) at non-singleton dimension 1

and the below

/opt/conda/conda-bld/pytorch_1724789122112/work/aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeI
ndex: block: [845,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1724789122112/work/aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeI
ndex: block: [845,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
 31%|████████████████████▍                                              | 6110/20000 [01:38<03:43, 62.08it/s]
ERROR:mteb.evaluation.MTEB:Error while evaluating InfoSeekIT2ITRetrieval: CUDA error: device-side assert trig
gered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be 
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/data/niklas/mieb/mteb/scripts/run_mieb.py", line 81, in <module>
    results = evaluation.run(model, output_folder="/data/niklas/mieb/results-mieb-final", batch_size=1)
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 464, in run
    raise e
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 425, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 300, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py", line 269, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py", line 278, in _evaluate_subset
    results = retriever(corpus, queries)
  File "/data/niklas/mieb/mteb/mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py", line 290, in _
_call__
    return self.retriever.search(
  File "/data/niklas/mieb/mteb/mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py", line 194, in s
earch
    sub_corpus_embeddings = self.model.get_fused_embeddings(
  File "/data/niklas/mieb/mteb/mteb/models/vista_models.py", line 169, in get_fused_embeddings
    all_embeddings.append(batch_embeddings.cpu())
RuntimeError: CUDA error: device-side assert triggered 
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be 
incorrect.
[

also getting this for OVENIT2ITRetrieval ; maybe a problem with our bge implementation

@Muennighoff Muennighoff changed the title InfoSeekIT2ITRetrieval & InfoSeekIT2TRetrieval fail with BAAI/bge-visualized [mieb] InfoSeekIT2ITRetrieval & InfoSeekIT2TRetrieval fail with BAAI/bge-visualized Nov 4, 2024
@isaac-chung isaac-chung added the mieb The image extension of MTEB label Nov 5, 2024
@gowitheflow-1998
Copy link
Contributor

strangely not able to reproduce the error on the two datasets. Can this be transformers and tokenizers version? Looks like it's relevant to max length not truncated.
On my end, I am able to run:

import mteb
model = mteb.get_model(
    "BAAI/bge-visualized-m3"
)
model.get_text_embeddings(texts = ["s"*10000,"s"*1])

with transformers==4.44.2 tokenizers=0.19.1

@Muennighoff
Copy link
Contributor Author

Hm Im running with the below and still getting the 2nd error with BAAI/bge-visualized-base-base

(/env/lib/conda/gritkto4) niklas@dojo-a3-ghpc-61:/data/niklas/mieb/mteb$ pip show transformers
pip show toName: transformers
Version: 4.46.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /data/env/lib/conda/gritkto4/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: FlagEmbedding, peft, salesforce-lavis, sentence-transformers
k(/env/lib/conda/gritkto4) niklas@dojo-a3-ghpc-61:/data/niklas/mieb/mteb$ pip show tokenizers
Name: tokenizers
Version: 0.20.1
Summary: 
Home-page: https://github.com/huggingface/tokenizers
Author: Anthony MOI <[email protected]>
Author-email: Nicolas Patry <[email protected]>, Anthony Moi <[email protected]>
License: 
Location: /data/env/lib/conda/gritkto4/lib/python3.10/site-packages
Requires: huggingface-hub
Required-by: transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mieb The image extension of MTEB
Projects
None yet
Development

No branches or pull requests

3 participants