Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding temperature scaling on Joiner logits: #789

Merged
merged 2 commits into from
Apr 26, 2024

Conversation

KarelVesely84
Copy link
Contributor

@KarelVesely84 KarelVesely84 commented Apr 18, 2024

  • T is configurable

  • so far best result NCE 0.122 (still not so high)

    • the BPE scores were rescaled with 0.2 (but then also incorrect words get high confidence, visually reasonable histograms are for 0.5 scale)
    • BPE->WORD score merging done by min(.) function (tried also prob-product, and also arithmetic, geometric, harmonic mean)
  • without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was the best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences, but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

// copy raw logits, apply temperature-scaling (for confidences)
int32_t p_logit_items = vocab_size * num_hyps;
std::vector<float> logit_with_temperature(p_logit_items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change p_logit in-place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it cannot be done in-place

the idea is to apply temperature only for computation of confidences,
the decoding continues to use the original values

this is why the logit values are copied to a new buffer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

@csukuangfj
Copy link
Collaborator

Thanks!

T hard-coded to 2.0 (is it okay ?)

Could you make it configurable and give it a default value 1.0 (like what we are doing for blank penalty)?
If it is 1.0, then the loop is skipped.

@KarelVesely84
Copy link
Contributor Author

Thanks!

T hard-coded to 2.0 (is it okay ?)

Could you make it configurable and give it a default value 1.0 (like what we are doing for blank penalty)? If it is 1.0, then the loop is skipped.

okay, working on it

i am not sure about the default 1.0, for 1.0 the confidences have worse quality than for 2.0,
and, as it is implemented, setting it to 2.0 will not affect the decoding at all, just the confidence
values will be different, to proposing to set it to 2.0
(same value appears in the Google article for NN-posterios confidences)

@csukuangfj
Copy link
Collaborator

1.0 is for backward compatibility.

as it is implemented, setting it to 2.0 will not affect the decoding at all,

In that case, 2.0 is fine with me.

@KarelVesely84
Copy link
Contributor Author

okay, the T parameter is now configurable

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 22, 2024

there seems to be some problem with the workload tests, many fail with the 503 error pointing to huggingface URL

otherwise, it should be ready for a code reivew (tested with a local client, and it works as expected)

@KarelVesely84 KarelVesely84 force-pushed the confidence_temperature_scaling branch from ce6e5b5 to 8b57f73 Compare April 23, 2024 08:03
@KarelVesely84
Copy link
Contributor Author

i fixed an error in the android build

@KarelVesely84
Copy link
Contributor Author

the tests look OK, seeing only unrelated errors

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

@csukuangfj csukuangfj merged commit 2e45d32 into k2-fsa:master Apr 26, 2024
186 of 224 checks passed
XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants