compile support, PNGinfo, fixes, fixes, quality improvements.

matatonic · Sep 9, 2024 · e564df5 · e564df5
1 parent 3e77c5a
commit e564df5
Show file tree

Hide file tree

Showing 10 changed files with 192 additions and 375 deletions.
diff --git a/1725903030-dall-e-3-1536x1536-hd-1.png b/1725903030-dall-e-3-1536x1536-hd-1.png
diff --git a/README.md b/README.md
@@ -19,10 +19,12 @@ An OpenAI API compatible image generation server for the FLUX.1 family of models
 - **Standalone Image Generation**: Uses your Nvidia GPU for image generation, doesn't use ComfyUI, SwarmUI or any other backend
 - **Lora Support**: Support for multiple loras with individual scaling weights (strength)
 - **Torch Compile Support**: Faster image generations with `torch.compile` (up to 20% faster in my tests, maybe more or less for other setups).
-- [ ] **Easy to setup and use**: Maybe?
-- [ ] **Upscaler Support** (planned)
+- **PNG Metadata**: Save images with generation parameters.
 - [ ] **BNB NF4 Quantization** (planned)
+- [ ] **Fast Quant Loading** (planned)
+- [ ] **Upscaler Support** (planned)
 - [ ] **GGUF Loading** (planned)
+- [ ] **Easy to setup and use**: Maybe?
 
 
 ## Quickstart
@@ -107,6 +109,8 @@ You can use the OpenAI python client to interact with the API. A sample applicat
 ```shell
 pip install -U openai
 python generate.py -m dall-e-3 -s "1024x256" -f new_logo.png "A banner style logo for the website of the OpenedAI Images Flux, an OpenAI API Image generator server which uses the Black Forest Labs FLUX.1 model."
+# Or simply:
+python generate.py "An astronaut in the jungle"
 ```
 
 See the OpenAI Images Guide API and API Documentation for more ways to use the API.
@@ -119,54 +123,66 @@ There is a more detailed configuration guide in the [CONFIG.md](CONFIG.md).
 
 ## Pre-Configured models
 
-> FP8 is the only available quantization so far, but more will come soon!
-
-> There are other options available for low GPU VRAM support
-
 Additional models are also available by default, there are option for all type of GPU setups.
 
 * Only one model can be loaded at a time, and models are loaded on demand.
 
-By default, the following models are configured (require ~40GB VRAM, bfloat16, <1s/step):
+By default, the following models are configured (require ~40GB VRAM, bfloat16):
 
-- `schnell`: `flux.1-schnell.json` FLUX.1 Schnell (official) (4 step, ~3s)
-- `dev`: FLUX.1 Dev (without enhancement) (25/50 steps, ~15-30s)
-- `merged`: `sayakpaul-flux.1-merged.json` Dev+Schnell merged, 12 steps
+- `schnell`: `flux.1-schnell.json` FLUX.1 Schnell (official) (4 step)
+- `dev`: FLUX.1 Dev (without enhancement) (25/50 steps)
+- `merged`: `sayakpaul-flux.1-merged.json` Dev+Schnell merged, (12 steps)
 - `dall-e-2` is set to use `shnell`
 - `dall-e-3` is set to use `dev`, with prompt enhancement if an openai chat API is available.
 
-Additional FP8 quantized models (require 24GB VRAM and can be slow to load, `+enable_vae_slicing`, `+enable_vae_tiling`, ~3+s/step):
+Additional FP8 quantized models (require 24GB VRAM and can be slow to load, `+enable_vae_slicing`, `+enable_vae_tiling`):
 
-- `schnell-fp8`: `kijai-flux.1-schnell-fp8.json` Scnhell with FP8 quantization, 4 steps (10-15s)
-- `dev-fp8`: `kijai-flux.1-dev-fp8.json` Dev with FP8 quantization, 25/50 steps
-- `merged-fp8`: `drbaph-flux.1-merged-fp8.json` Dev+Schnell merged, FP8 quantization, 12 steps by default
-- `merged-fp8-4step`: `drbaph-flux.1-merged-fp8-4step.json` Dev+Schnell merged, FP8 quantization, 4 steps
+- `schnell-fp8`: `kijai-flux.1-schnell-fp8.json` Scnhell with FP8 quantization (4 steps)
+- `merged-fp8-4step`: `drbaph-flux.1-merged-fp8-4step.json` Dev+Schnell merged, FP8 quantization (4 steps)
+- `merged-fp8`: `drbaph-flux.1-merged-fp8.json` Dev+Schnell merged, FP8 quantization (12 steps)
+- `dev-fp8`: `kijai-flux.1-dev-fp8.json` Dev with FP8 quantization (25/50 steps)
 
-Additional FP8 models (require 16GB VRAM and can be slow to load, `+enable_model_cpu_offload`, ~5+s/step):
+Additional FP8 models (require 16GB VRAM and can be slow to load, `+enable_model_cpu_offload`):
 
-- `schnell-fp8-16GB`: `kijai-flux.1-schnell-fp8-16GB.json` Scnhell, 4 steps (~15-30s)
-- `dev-fp8-16GB`: `kijai-flux.1-dev-fp8-16GB.json` Dev with FP8 quantization, 25/50 steps
-(slightly better)
-- `merged-fp8-4step-16GB`: `drbaph-flux.1-merged-fp8-4step-16GB.json` Dev+Schnell merged, 4 steps
-- `merged-fp8-16GB`: `drbaph-flux.1-merged-fp8-16GB.json` Dev+Schnell merged, 12 steps by default
+- `schnell-fp8-16GB`: `kijai-flux.1-schnell-fp8-16GB.json` Scnhell (4 steps)
+- `dev-fp8-16GB`: `kijai-flux.1-dev-fp8-16GB.json` Dev with FP8 quantization (25/50 steps)
+- `merged-fp8-4step-16GB`: `drbaph-flux.1-merged-fp8-4step-16GB.json` Dev+Schnell merged (4 steps)
+- `merged-fp8-16GB`: `drbaph-flux.1-merged-fp8-16GB.json` Dev+Schnell merged (12 steps)
 
 Additional NF4 models (require 12GB VRAM):
 
 - sayakpaul-dev-nf4-12GB: soon ...
 - sayakpaul-dev-nf4-compile-12GB: soon ...
 
-Low VRAM options (<4GB VRAM, 34GB RAM, `+enable_sequential_cpu_offload`, float16 instead of bfloat16, 8-15+s/step):
+Low VRAM options (<4GB VRAM, 34GB RAM, `+enable_sequential_cpu_offload`, float16 instead of bfloat16):
 
-- `schnell-low`: `flux.1-schnell-low.json` Schnell FP16, (30-60s per image)
-- `dev-low`: `flux.1-dev-low.json` Dev FP16, at least a few minutes per image
-- `merged-low`: `sayakpaul-flux.1-merged-low.json` Dev+Schnell FP16 merged, 12 steps by default
+- `schnell-low`: `flux.1-schnell-low.json` Schnell FP16 (4 steps)
+- `merged-low`: `sayakpaul-flux.1-merged-low.json` Dev+Schnell FP16 merged (12 steps)
+- `dev-low`: `flux.1-dev-low.json` Dev FP16 (25/50 steps)
 
-And more, check out the `config/lib` folder for more examples, including lora options.
+There are `-compile` variants of many models as well. Be advised that the first couple images in a compiled model will be very slow to generate. The server must load, perhaps quantize and compile, and then the generation is dynamically optimized over the next couple generations, the first image may be 10 minutes or more to prepare. Most models can generate dozens of images in that time, so only use compiled models if you know what you're doing.
+
+And more, including `int8` quants, check out the `config/lib` folder for more examples, including lora options.
 
 > Timings are casually measured at 1024x1024 standard on an Nvidia A100 and may vary wildly from your system.
 
 > \*) The name of the generator file is used to determine if a model is already loaded or not, if you edit a generator config in a way which requires reloading the model (such as changing `pipeline` or `options`), it wont reload it automatically. `config.json` and `generation_kwargs` will always be loaded each API call.
 
+> Requesting an image generation with a special model called `unload` will unload the current model, freeing up it's RAM and VRAM resources.
+
+## Performance
+
+Performance plots for A100 (80GB) and 4090 (24GB), batch size = 1. Click Details to expand.
+<details>
+
+![alt text](processing_time_A100.png)
+
+*) `dall-e-3` in this plot is `FLUX.1 Dev, enhanced`, not OpenAI `dall-e-3`
+
+![alt text](processing_time_4090.png)
+
+</details>
+
 
 ## Server Usage
 
@@ -191,7 +207,7 @@ options:
 
 #### "The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens:"
 
-* Long prompt encoding into CLIP is not yet supported (not working), all is not lost however, there are 2 encoders and the T5 encoder (text_encoder_2) supports up to 240 (?) tokens. No fix yet.
+* Long prompt encoding into CLIP is not yet supported (not working), all is not lost however, there are 2 encoders and the T5 encoder (text_encoder_2) supports up to 256 tokens. No fix yet.
 
 #### "that cleft chin woman", "everyone is too beautiful"
 
@@ -217,4 +233,7 @@ Additional Model formats and merges created by:
 
 - OpenedAI Images FLux is released under the [GNU Affero General Public License v3.0](https://choosealicense.com/licenses/agpl-3.0/)
 - [FLUX.1 \[dev\] Non-Commercial License.](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
-- [FLUX.1 \[schnell\] is Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.](https://choosealicense.com/licenses/apache-2.0/)
+- [FLUX.1 \[schnell\] is Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.](https://choosealicense.com/licenses/apache-2.0/)
+
+
+![alt text](1725903030-dall-e-3-1536x1536-hd-1.png)
diff --git a/full_tests.sh b/full_tests.sh
@@ -1,31 +1,32 @@
 #!/bin/sh
 
-prompt="An astronaut in the jungle, blue-orange colour scheme"
-CSV=perf.csv
-G=$1 # A100 or other
+folder="$1"
+G="$2" # A100 or other
+prompt="cool street art of an astronaut in a jungle, stencil and spray paint style, the flag says 'FLUX.1', on a door in a graffiti covered alley scene with used spray paint cans littering the ground, a sign by the door says 'Open' in faded, yellowed, black and white plastic, the secret door to a great club, the other writing says 'matatonic', 'flux.1 2024', 'black forest labs', 'openedai' and 'uNStaBLe' in unique styles"
+CSV="$folder/perf.csv"
 
-T40="schnell merged dev"
+T40="dall-e-3 schnell merged dev"
 T40C="schnell-compile merged-compile dev-compile"
-T24="schnell-fp8 merged-fp8-4step merged-fp8 dev-fp8 dev-fp8-e5m2"
-T24C="merged-fp8-4step-compile dev-fp8-e5m2-compile"
-T16="schnell-fp8-16GB merged-fp8-4step-16GB merged-fp8-16GB dev-fp8-16GB dev-fp8-e5m2-16GB"
+T24="schnell-fp8 merged-fp8-4step merged-fp8 dev-fp8"
+T24C="schnell-fp8-compile merged-fp8-4step-compile merged-fp8-compile dev-fp8-compile"
+T16="schnell-fp8-16GB merged-fp8-4step-16GB merged-fp8-16GB dev-fp8-16GB"
 T12="" #"sayakpaul-dev-nf4-12GB"
 T4="schnell-low merged-low dev-low"
 
 if [ "$G" = "A100" ]; then
 	for i in $T40 ; do
-		./test_images.py -p -t test/$i/$G "$prompt" -n 1 -m $i --csv $CSV -T $G
+		./test_images.py -p -t $folder/$i/$G "$prompt" -n 1 -m "$i" --csv "$CSV" -T $G
 	done
 
 	for i in $T40C ; do
-		./test_images.py -p -t test/$i/$G "$prompt" -n 3 -m $i --csv $CSV -T $G
+		./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 3 -m "$i" --csv "$CSV" -T $G
 	done
 fi
 
 for i in $T24 $T16 $T12 $T4 ; do
-	./test_images.py -p -t test/$i/$G "$prompt" -n 1 -m $i --csv $CSV -T $G
+	./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 1 -m "$i" --csv "$CSV" -T $G
 done
 
 for i in $T24C ; do
-	./test_images.py -p -t test/$i/$G "$prompt" -n 3 -m $i --csv $CSV -T $G
+	./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 3 -m "$i" --csv "$CSV" -T $G
 done
diff --git a/generate.py b/generate.py
@@ -8,6 +8,7 @@
 import openai
 from PIL import Image
 
+
 def parse_args(argv):
     # prompt, model, size, filename
     parser = argparse.ArgumentParser(description='Generate an image from a prompt using OpenAI\'s DALL-E API.')
@@ -22,54 +23,59 @@ def parse_args(argv):
     parser.add_argument('-E', '--no-enhancement', action='store_true', help='Do not enhance the prompt.')
     parser.add_argument('-S', '--no-show', action='store_true', help='Do not display the image.')
     parser.add_argument('-V', '--no-save', action='store_true', help='Do not save the image, view only')
+    parser.add_argument('-B', '--bulk', action='store_true', help='Process prompts from file, one per line.')
 
     return parser.parse_args(argv)
 
 if __name__ == '__main__':
     args = parse_args(sys.argv[1:])
 
-    if args.no_enhancement:
-        args.prompt = "I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:" + args.prompt
-
     client = openai.Client(base_url='http://localhost:5005/v1', api_key='sk-ip')
 
-    def generation_round():
-
-        response = client.images.generate(
-            prompt=args.prompt,
-            response_format='b64_json',
-            model=args.model,
-            size=args.size,
-            n=int(args.batch),
-            quality=args.quality,
-        )
+    if args.bulk:
+        all_prompts = [ line.strip() for line in open(args.prompt, 'r') if len(line.strip()) > 0 and line.strip()[0] != '#']
+    else:
+        all_prompts = [ args.prompt ]
+
+    for prompt in all_prompts:
+        if args.no_enhancement:
+            prompt = "I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:" + prompt
 
-        for n, img in enumerate(response.data):
-            image = Image.open(io.BytesIO(base64.b64decode(img.b64_json)))
+        def generation_round():
+            response = client.images.generate(
+                prompt=prompt,
+                response_format='b64_json',
+                model=args.model,
+                size=args.size,
+                n=int(args.batch),
+                quality=args.quality,
+            )
 
-            if not args.no_save:
-                if args.filename:
-                    filename = args.filename
-                    if int(args.batch) > 1:
-                        filename = f"{filename.split('.png')[0]}-{n}.png"
-                else:
-                    f_args = dict(
-                        short_prompt=args.prompt[:20],
-                        prompt=args.prompt,
-                        n=n,
-                        model=args.model,
-                        size=args.size,
-                        quality=args.quality,
-                        created=response.created,
-                    )
-
-                    filename = args.auto_name_format.format(**f_args).replace('/','_')
-
-                image.save(filename, format="PNG")
-                print(f'Saved: {filename}')
-
-            if not args.no_show:
-                image.show()
+            for n, img in enumerate(response.data):
+                if not args.no_save:
+                    if args.filename:
+                        filename = args.filename
+                        if int(args.batch) > 1:
+                            filename = f"{filename.split('.png')[0]}-{n}.png"
+                    else:
+                        f_args = dict(
+                            short_prompt=args.prompt[:20],
+                            prompt=args.prompt,
+                            n=n,
+                            model=args.model,
+                            size=args.size,
+                            quality=args.quality,
+                            created=response.created,
+                        )
+
+                        filename = args.auto_name_format.format(**f_args).replace('/','_')
+
+                    with open(filename, 'wb') as f:
+                        f.write(base64.b64decode(img.b64_json))
+                        print(f'Saved: {filename}')
+
+                if not args.no_show:
+                    Image.open(filename).show()
 
-    for i in range(0, args.rounds):
-        generation_round()
+        for i in range(0, args.rounds):
+            generation_round()
diff --git a/images.py b/images.py
@@ -7,17 +7,17 @@
 import os
 import sys
 import time
-from loguru import logger
 
-import torch
+from PIL import Image, PngImagePlugin
 from diffusers import FluxTransformer2DModel, FluxPipeline
+from loguru import logger
+from pydantic import BaseModel
 from transformers import T5EncoderModel, CLIPTextModel
-import optimum.quanto
-
-import uvicorn
 from typing import Optional
-from pydantic import BaseModel
 import openai
+import optimum.quanto
+import torch
+import uvicorn
 
 import openedai
 
@@ -263,7 +263,7 @@ async def generate_images(pipe, **generation_kwargs) -> list:
 
     generation_kwargs['generator'] = torch.Generator("cpu").manual_seed(seed)
 
-    return pipe(**generation_kwargs).images
+    return pipe(**generation_kwargs).images, seed
 
 
 async def enhance_prompt(prompt: str, **enhancer) -> str:
@@ -286,7 +286,7 @@ async def enhance_prompt(prompt: str, **enhancer) -> str:
 @app.post("/v1/images/generations")
 async def generations(request: GenerationsRequest):
     resp = {
-        'created': int(time.time()),
+        'created': int(time.time() * 1000),
         'data': []
     }
 
@@ -315,17 +315,31 @@ async def generations(request: GenerationsRequest):
 
     try:
         pipe = await ready_model(generator_name, model_config)
-        images = await generate_images(pipe, **generation_kwargs)
+        images, seed = await generate_images(pipe, **generation_kwargs)
 
         if images:
             for img in images:
-                # TODO: cache images, add get method for cache fetch
+                def make_pngmetadata():
+                    # not sure how flux does it, but this is how SD did it.
+                    # a closeup portrait of a playful maid, undercut hair, apron, amazing body, pronounced feminine feature, busty, kitchen, [ash blonde | ginger | pink hair], freckles, flirting with camera.Negative prompt: (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation. tattoo.
+                    # Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 6.5, Seed: 1804518985, Size: 768x1024, Model hash: 9aba26abdf, Model: Deliberate, ENSD: 31337
+                    k = generation_kwargs
+                    parameters = f"{k['prompt']}{'.' if k['prompt'][-1] != '.' else ''}Steps: {k['num_inference_steps']}, Sampler: Euler, CFG Scale: {k['guidance_scale']}, Seed: {seed}, Size: {k['width']}x{k['height']}, Model: {request.model}" # batch?
+                    pngmetadata = PngImagePlugin.PngInfo()
+                    pngmetadata.add_text('Parameters', parameters)
+                    return pngmetadata
+
+                pnginfo = make_pngmetadata()
+
                 if args.log_level == 'DEBUG':
-                    img.save("config/debug.png")
+                    img.save("config/debug.png", pnginfo=pnginfo)
+
                 img_bytes = io.BytesIO()
-                img.save(img_bytes, format='PNG')
+                img.save(img_bytes, format='PNG', pnginfo=pnginfo)
                 b64_json = base64.b64encode(img_bytes.getvalue()).decode('utf-8')
                 img_bytes.close()
+
+
                 if request.response_format == 'b64_json':
                     img_dat = {'b64_json': b64_json}
                 else: