Support fp8_e4m3/fp8_e5m2

huggingface · Nov 17, 2023 · c70ba7e · c70ba7e
1 parent 96061e9
commit c70ba7e
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -99,7 +99,9 @@ Notes:
  from traditional tensor libraries perspective (torch, tensorflow, numpy, ..).
  - 0-rank Tensors (tensors with shape `[]`) are allowed, they are merely a scalar.
  - The byte buffer needs to be entirely indexed, and cannot contain holes. This prevents
- the creation of polyglot files.
+ - Endianness: Little-endian.
+ moment.
+ - Order: 'C' or row-major.
 
 
 ### Yet another format ?
@@ -113,7 +115,7 @@ formats.
 Let's take a look at alternatives and why this format is deemed interesting.
 This is my very personal and probably biased view:
 
-| Format                  | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16
+| Format                  | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16/Fp8
 | ----------------------- | --- | --- | --- | --- | --- | --- | --- |
 | pickle (PyTorch)        | ✗ | ✗ | ✗ | 🗸 | ✗ | 🗸 | 🗸 |
 | H5 (Tensorflow)         | 🗸 | ✗ | 🗸 | 🗸 | ~ | ~ | ✗ |
@@ -133,7 +135,7 @@ some tensors in it without scanning the whole file (distributed setting) ?
 - Layout control: Lazy loading, is not necessarily enough since if the information about tensors is spread out in your file, then even if the information is lazily accessible you might have to access most of your file to read the available tensors (incurring many DISK -> RAM copies). Controlling the layout to keep fast access to single tensors is important.
 - No file size limit: Is there a limit to the file size ?
 - Flexibility: Can I save custom code in the format and be able to use it later with zero extra code ? (~ means we can store more than pure tensors, but no custom code)
-- Bfloat16: Does the format support native bfloat16 (meaning no weird workarounds are
+- Bfloat16/Fp8: Does the format support native bfloat16/fp8 (meaning no weird workarounds are
 necessary)? This is becoming increasingly important in the ML world.
 
 

diff --git a/safetensors/src/tensor.rs b/safetensors/src/tensor.rs
@@ -641,6 +641,12 @@ pub enum Dtype {
     U8,
     /// Signed byte
     I8,
+    /// FP8 <https://arxiv.org/pdf/2209.05433.pdf>_
+    #[allow(non_camel_case_types)]
+    F8_E5M2,
+    /// FP8 <https://arxiv.org/pdf/2209.05433.pdf>_
+    #[allow(non_camel_case_types)]
+    F8_E4M3,
     /// Signed integer (16-bit)
     I16,
     /// Unsigned integer (16-bit)
@@ -670,6 +676,8 @@ impl Dtype {
             Dtype::BOOL => 1,
             Dtype::U8 => 1,
             Dtype::I8 => 1,
+            Dtype::F8_E5M2 => 1,
+            Dtype::F8_E4M3 => 1,
             Dtype::I16 => 2,
             Dtype::U16 => 2,
             Dtype::I32 => 4,