Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize float precision conversion #1261

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

emricksinisonos
Copy link
Collaborator

@emricksinisonos emricksinisonos commented Nov 14, 2023

Description

Conversion from f32 to f16 was already supported, this PR adds the support of f16 to f32 conversion and adds a bit of genericity in the implementation of the converter.

It also breaks high level tract API by replacing the half() API by:

  • f32_to_f16(): equivalent to half(), f32 -> f16 conversion
  • f16_to_f32(): f16 -> f32 conversion

@emricksinisonos emricksinisonos force-pushed the task/generalize-float-precision-conversion branch from f2f68e6 to f3a434d Compare November 14, 2023 10:08
@emricksinisonos emricksinisonos marked this pull request as ready for review November 14, 2023 11:25
@emricksinisonos emricksinisonos requested a review from kali November 21, 2023 16:59
@kali
Copy link
Collaborator

kali commented Nov 22, 2023

We should keep --half-floats as an alias() for f32-to-f16

@kali kali merged commit 43b1463 into main Nov 22, 2023
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants