refactoring answer metric(BLEU, METEOR, ROUGE... etc ) #398

Eastsidegunn · 2023-12-26T07:20:26Z

We have few metric based on category of metric which can be controllable with few parameter (if based on n-gram, can choose n)
more flexible!

in my opinion,

give metrics initial values(maybe initial values followed by initial value of wrapped metric)

have more idea?

vkehfdl1 · 2023-12-27T04:01:31Z

It will give great flexibility of metrics, but I think it doesn't have to implement in RAGchain.
There are mainly two reason.

Each metrics might have conventional setting.
We target RAG workflow and its researches, there must be conventional hyperparameter or setups to each metrics. Because all benchmark should perform in same setting, ideally. So, it is better that we suggest conventional setup for each metric.
User can calculate score with their metrics after get their own result.
We give whole pd.DataFrame that contain question, answer, gt answer, etc. User can easily calculate score with their own metric with this Dataframe. So, if someone wants to score their result with new metric, they can do that easily. (Maybe we can make guide for that later. I did once with Rare F1 metric.)

Plus, I think it will be too complicated to use our evaluator. Sometimes, framework should restrict flexibility for easy to use.

Eastsidegunn · 2023-12-31T12:16:27Z

Ok,

I think that adding EM(Exactly match) metric is one of defined step.
(conclusion of surfing on many benchmark)

Actually, I can't perceive setup which I can suggest conventinally.
BLEU, and ROUGE score is wrapped official(maybe...? basically used) library...
they have many variation according to n or perspective.
this problem clearly need to be solved by our evaluator

In my think,
evaluator should open folloing functions

add costum normalizer and tokenizer
add metric function on existing metric list

how about metric_expaneded-version..?
like metric.py, metric_expanded-version is collection of various metrics that are not officially(?) accepted
name is tentative

Eastsidegunn self-assigned this Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactoring answer metric(BLEU, METEOR, ROUGE... etc ) #398

refactoring answer metric(BLEU, METEOR, ROUGE... etc ) #398

Eastsidegunn commented Dec 26, 2023

vkehfdl1 commented Dec 27, 2023

Eastsidegunn commented Dec 31, 2023

refactoring answer metric(BLEU, METEOR, ROUGE... etc ) #398

refactoring answer metric(BLEU, METEOR, ROUGE... etc ) #398

Comments

Eastsidegunn commented Dec 26, 2023

vkehfdl1 commented Dec 27, 2023

Eastsidegunn commented Dec 31, 2023