Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some models and results #54

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

small-starriest
Copy link

Add some models name and results under our datasets.

@Samoed
Copy link
Contributor

Samoed commented Nov 18, 2024

Can you sync reslts.py with main and remove .idea?

@small-starriest
Copy link
Author

Sure! But I don't know why the pytest program coundn't pass.
failed

@KennethEnevoldsen
Copy link
Contributor

It would be great if you could specify the models and some notions of the datasets as well.

I am unsure why there are changes other than simply adding new results files (*.json)?

@Samoed
Copy link
Contributor

Samoed commented Nov 18, 2024

How do you run models? Because mteb don't produce queries field in results and have more data in results field. Also can you rerun models with updated MTEB version? Becuase in results you have 1.2.0 version

@small-starriest
Copy link
Author

small-starriest commented Nov 18, 2024

For each model, we divide it into instruction-tuned models and non-instruction-tuned models.
Our datasets is named MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous benchmark designed for
evaluating instructed information retrieval (IR). It includes 126 retrieval tasks across 6 domains, collected from existing
datasets, with each uery annotated with detailed retrieval instructions. Compared to other IR benchmarks, MAIR extends the
evaluation scope to broader IR applications, including those in RAG, code retrieval, agent-based retrieval, biomedical, legal
IR, and more. It also enhances evaluation efficiency through careful data sampling and diversification.
For rerun models with updated MTEB version, can I skip it?
Besides, I still don't understand the error information in the pytest program.

@Samoed
Copy link
Contributor

Samoed commented Nov 18, 2024

From test logs

>           obj = cls.model_validate(data)
E           pydantic_core._pydantic_core.ValidationError: 3 validation errors for TaskResult
E           task_name
E             Field required [type=missing, input_value={'dataset_revision': '7d2..., 'NDCG@10': 0.71071}]}}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.9/v/missing
E           scores
E             Value error, 'main_score' should be in scores [type=value_error, input_value={'test': [{'hf_subset': '...', 'NDCG@10': 0.71071}]}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.9/v/value_error
E           evaluation_time
E             Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='', input_type=str]
E               For further information visit https://errors.pydantic.dev/2.9/v/float_parsing

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Nov 18, 2024

@small-starriest thanks for the information. It fails because the results objects are invalid. E.g.

     "dataset_revision": "7d24eac886a6ae6653a6b67433e1c302cb0e9ac6",
     "mteb_version": "1.2.0",
     "evaluation_time": "",
     "mteb_dataset_name": "ACORDAR",
     "mteb_mair_domain": "Web",
     "queries": {
         "NDCG@10": 0.31936
     },
     "scores": {
         "test": [
             {
                 "hf_subset": "default",
                 "NDCG@10": 0.31936
             }
         ]
     }

contains the field "queries" and "mteb_mair_domain".
It does not contain a value for "main_score", this is stated here:

Value error, 'main_score' should be in score

evaluation_time is invalid:

evaluation_time
E             Input should be a valid number

and you are missing:

task_name
E             Field required

It looks like you might be using a custom version of MTEB or might have manually written the results object.

@small-starriest
Copy link
Author

Thanks!It really helps a lot!

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Nov 19, 2024

@small-starriest, please do not post-hoc fix the object. instead make sure they were created using mteb (ideally using the newest version)

Feel free to start out with a small submission is one model to make sure that the format is correct before running all models.

@small-starriest
Copy link
Author

small-starriest commented Nov 20, 2024

Thanks!
https://github.com/embeddings-benchmark/mteb/pull/1425
My collaborator has successfully submitted the PR for the dataset, and after the successful merge, I believe my pytest can pass successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants