Add some models and results #54

small-starriest · 2024-11-18T11:14:50Z

Add some models name and results under our datasets.

Samoed · 2024-11-18T11:25:50Z

Can you sync reslts.py with main and remove .idea?

small-starriest · 2024-11-18T11:35:22Z

Sure! But I don't know why the pytest program coundn't pass.

KennethEnevoldsen · 2024-11-18T12:01:56Z

It would be great if you could specify the models and some notions of the datasets as well.

I am unsure why there are changes other than simply adding new results files (*.json)?

Samoed · 2024-11-18T12:02:20Z

How do you run models? Because mteb don't produce queries field in results and have more data in results field. Also can you rerun models with updated MTEB version? Becuase in results you have 1.2.0 version

small-starriest · 2024-11-18T12:42:55Z

For each model, we divide it into instruction-tuned models and non-instruction-tuned models.
Our datasets is named MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous benchmark designed for
evaluating instructed information retrieval (IR). It includes 126 retrieval tasks across 6 domains, collected from existing
datasets, with each uery annotated with detailed retrieval instructions. Compared to other IR benchmarks, MAIR extends the
evaluation scope to broader IR applications, including those in RAG, code retrieval, agent-based retrieval, biomedical, legal
IR, and more. It also enhances evaluation efficiency through careful data sampling and diversification.
For rerun models with updated MTEB version, can I skip it?
Besides, I still don't understand the error information in the pytest program.

Samoed · 2024-11-18T14:59:00Z

From test logs

>           obj = cls.model_validate(data)
E           pydantic_core._pydantic_core.ValidationError: 3 validation errors for TaskResult
E           task_name
E             Field required [type=missing, input_value={'dataset_revision': '7d2..., 'NDCG@10': 0.71071}]}}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.9/v/missing
E           scores
E             Value error, 'main_score' should be in scores [type=value_error, input_value={'test': [{'hf_subset': '...', 'NDCG@10': 0.71071}]}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.9/v/value_error
E           evaluation_time
E             Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='', input_type=str]
E               For further information visit https://errors.pydantic.dev/2.9/v/float_parsing

KennethEnevoldsen · 2024-11-18T15:05:58Z

@small-starriest thanks for the information. It fails because the results objects are invalid. E.g.

     "dataset_revision": "7d24eac886a6ae6653a6b67433e1c302cb0e9ac6",
     "mteb_version": "1.2.0",
     "evaluation_time": "",
     "mteb_dataset_name": "ACORDAR",
     "mteb_mair_domain": "Web",
     "queries": {
         "NDCG@10": 0.31936
     },
     "scores": {
         "test": [
             {
                 "hf_subset": "default",
                 "NDCG@10": 0.31936
             }
         ]
     }

contains the field "queries" and "mteb_mair_domain".
It does not contain a value for "main_score", this is stated here:

Value error, 'main_score' should be in score

evaluation_time is invalid:

evaluation_time
E             Input should be a valid number

and you are missing:

task_name
E             Field required

It looks like you might be using a custom version of MTEB or might have manually written the results object.

small-starriest · 2024-11-19T02:09:11Z

Thanks！It really helps a lot!

KennethEnevoldsen · 2024-11-19T16:16:24Z

@small-starriest, please do not post-hoc fix the object. instead make sure they were created using mteb (ideally using the newest version)

Feel free to start out with a small submission is one model to make sure that the format is correct before running all models.

small-starriest · 2024-11-20T05:17:30Z

Thanks!
https://github.com/embeddings-benchmark/mteb/pull/1425
My collaborator has successfully submitted the PR for the dataset, and after the successful merge, I believe my pytest can pass successfully.

add some models and results

1a2ee92

small-starriest added 2 commits November 18, 2024 19:39

remove .idea and sy

2d1b656

remove .idea

e8305be

Removed some fields and added the and fields

d64174c

small-starriest added 6 commits November 19, 2024 12:50

rename SciDocs to SCIDOCS

3c2672b

add mteb_dataset_name field to *.json

58fd4e0

add evaluation_time field

4cfe4aa

add test field

04771f2

add language field

30e5403

update language field

02e77dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some models and results #54

Add some models and results #54

small-starriest commented Nov 18, 2024

Samoed commented Nov 18, 2024

small-starriest commented Nov 18, 2024

KennethEnevoldsen commented Nov 18, 2024

Samoed commented Nov 18, 2024 •

edited

Loading

small-starriest commented Nov 18, 2024 •

edited

Loading

Samoed commented Nov 18, 2024

KennethEnevoldsen commented Nov 18, 2024 •

edited

Loading

small-starriest commented Nov 19, 2024

KennethEnevoldsen commented Nov 19, 2024 •

edited

Loading

small-starriest commented Nov 20, 2024 •

edited

Loading

Add some models and results #54

Are you sure you want to change the base?

Add some models and results #54

Conversation

small-starriest commented Nov 18, 2024

Samoed commented Nov 18, 2024

small-starriest commented Nov 18, 2024

KennethEnevoldsen commented Nov 18, 2024

Samoed commented Nov 18, 2024 • edited Loading

small-starriest commented Nov 18, 2024 • edited Loading

Samoed commented Nov 18, 2024

KennethEnevoldsen commented Nov 18, 2024 • edited Loading

small-starriest commented Nov 19, 2024

KennethEnevoldsen commented Nov 19, 2024 • edited Loading

small-starriest commented Nov 20, 2024 • edited Loading

Samoed commented Nov 18, 2024 •

edited

Loading

small-starriest commented Nov 18, 2024 •

edited

Loading

KennethEnevoldsen commented Nov 18, 2024 •

edited

Loading

KennethEnevoldsen commented Nov 19, 2024 •

edited

Loading

small-starriest commented Nov 20, 2024 •

edited

Loading