Fix the issue of parameters updated as nan during reward model training. #69

llauraa23 · 2023-09-25T04:01:05Z

Language model is loaded in torch.float16. Adam optimizer adds epsilon to avoid zero denominator. Note, torch.float16 will round any number smaller than 6e-8 to 0. Do not change epsilon to smaller than 6e-8.

…ng. rw_finetuning.py Language model is loaded in torch.float16. Adam optimizer adds epsilon to avoid zero denominator. Note, torch.float 16 will round any number smaller than 6e-8 to 0. Do not change episolon to smaller than 6e-8.

CambioML · 2023-09-25T04:01:22Z

LGTM! 👍

CambioML merged commit 9b64d82 into CambioML:main Sep 25, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue of parameters updated as nan during reward model training. #69

Fix the issue of parameters updated as nan during reward model training. #69

llauraa23 commented Sep 25, 2023

CambioML commented Sep 25, 2023

Fix the issue of parameters updated as nan during reward model training. #69

Fix the issue of parameters updated as nan during reward model training. #69

Conversation

llauraa23 commented Sep 25, 2023

CambioML commented Sep 25, 2023