Scores and probability calcuations #15

namdw · 2024-07-09T16:05:41Z

Line 39 in e524519

prb[i][j] = 1 / (1 + np.exp(score[j] - score[i]))

From my understanding of the code, the score list here is the output from the blender.rank(*, return_scores=True) which should output the average relative score of the response in the index being better than other responses. Please correct me if wrong.

For example, given three responses, {y1, y2, y3}, the first element of the scores output by the blender model (s1, s2, 3) is, s1 = P(y1 > y2) + P(y1 > y3), disregarding the constant coefficient and P is general preference score function, not probability. [references from blender code and their paper]

Thus, subtracting two scores, i.e., s1 - s2, is also dependent on the third response y3 as well, which seems a bit different from what is described in the paper.

In summary, I feel it is more appropriate to use the score output from the blender with just two responses (although, I don't think this would make a significant difference in the performance), e.g.,

score = blender.rank([x], [[yj, yi]], return_scores=True)[0, 0]
prb[i][j] = 1 / (1 + np.exp(score))

(sorry for the badly coded example)

The text was updated successfully, but these errors were encountered:

kaykyr · 2024-07-09T16:39:02Z

I don't understand the calcs yet, I don't know why, during my tests, the calculated probabilities are all coming as 0.5. Is this right?

kaykyr · 2024-07-10T14:23:27Z

I don't understand the calcs yet, I don't know why, during my tests, the calculated probabilities are all coming as 0.5. Is this right?

Sorry. It was my bad.

xukp20 · 2024-08-24T06:23:49Z

I have the same problem as @namdw.

It seems that LLM Blender uses its pair-wise RM to create the relative score table for all pairs (i, j) based on the logits of choosing "i" and "j" as the output. Since SPPO's code sets the "return_scores" param as True, LLM Blender will continue to do the Agg using "max_logits" strategy by default, which gives a single score for each candidate indicating the average relative score of that candidate being better than other responses.

So the returned score list of "rank" is related to the prompt, the chosen candidate, and all the candidates in the list, instead of the prompt and a pair of candidates which is presented as $s(y, y'; x)$ in the paper.

I think the code from @namdw goes the right way, but to be aligned with the paper, should we also put $e^{score(i,j)}$ instead of $1$ on the top when calculating $prb(i, j)$?

MeckyWu · 2024-09-18T20:04:30Z

Hi @namdw @xukp20

I'm sorry for not getting back to you sooner. You are right. The correct implementation should be to calculate the win-rate matrix first and then take the mean over the row. The current implementation is taking the mean over the score matrix (the logistic of it is the win rate) and then feeding that into a BT model.

We will fix this and re-run experiments asap. The performance difference is expected to be negligible as the numerical difference in the win rate between these two methods is expected to be rather small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scores and probability calcuations #15

Scores and probability calcuations #15

namdw commented Jul 9, 2024

kaykyr commented Jul 9, 2024

kaykyr commented Jul 10, 2024

xukp20 commented Aug 24, 2024 •

edited

Loading

MeckyWu commented Sep 18, 2024

Scores and probability calcuations #15

Scores and probability calcuations #15

Comments

namdw commented Jul 9, 2024

kaykyr commented Jul 9, 2024

kaykyr commented Jul 10, 2024

xukp20 commented Aug 24, 2024 • edited Loading

MeckyWu commented Sep 18, 2024

xukp20 commented Aug 24, 2024 •

edited

Loading