Wrong output fp16/bf16 dtype in ParallelEmbedding when sharding accross vocab #35

dacorvo · 2024-11-20T10:00:16Z

In the ParallelEmbedding layer, when sharding accross vocab, the output is masked at the very end of the operation.

It seems that the masking is done by multiplying by an hard-coded float mask, which leads to the actual float16/bfloat16 to be upcast to float32.

A correct implementation would be to multiply by a mask of the same type as the intended output.

The text was updated successfully, but these errors were encountered:

fayyadd · 2024-11-21T01:10:54Z

thanks for reporting! Our team will take a look.

fayyadd · 2024-11-21T23:33:30Z

Thanks for reaching out. Our team has a fix for this and it will be available in the upcoming releases. We will update this issue once the fix is released.

fayyadd added the bug Something isn't working label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong output fp16/bf16 dtype in ParallelEmbedding when sharding accross vocab #35

Wrong output fp16/bf16 dtype in ParallelEmbedding when sharding accross vocab #35

dacorvo commented Nov 20, 2024

fayyadd commented Nov 21, 2024

fayyadd commented Nov 21, 2024

Wrong output fp16/bf16 dtype in ParallelEmbedding when sharding accross vocab #35

Wrong output fp16/bf16 dtype in ParallelEmbedding when sharding accross vocab #35

Comments

dacorvo commented Nov 20, 2024

fayyadd commented Nov 21, 2024

fayyadd commented Nov 21, 2024