Skip to content

Commit

Permalink
compile support, PNGinfo, fixes, fixes, quality improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
matatonic committed Sep 9, 2024
1 parent 3e77c5a commit e564df5
Show file tree
Hide file tree
Showing 10 changed files with 192 additions and 375 deletions.
Binary file added 1725903030-dall-e-3-1536x1536-hd-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 47 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,12 @@ An OpenAI API compatible image generation server for the FLUX.1 family of models
- **Standalone Image Generation**: Uses your Nvidia GPU for image generation, doesn't use ComfyUI, SwarmUI or any other backend
- **Lora Support**: Support for multiple loras with individual scaling weights (strength)
- **Torch Compile Support**: Faster image generations with `torch.compile` (up to 20% faster in my tests, maybe more or less for other setups).
- [ ] **Easy to setup and use**: Maybe?
- [ ] **Upscaler Support** (planned)
- **PNG Metadata**: Save images with generation parameters.
- [ ] **BNB NF4 Quantization** (planned)
- [ ] **Fast Quant Loading** (planned)
- [ ] **Upscaler Support** (planned)
- [ ] **GGUF Loading** (planned)
- [ ] **Easy to setup and use**: Maybe?


## Quickstart
Expand Down Expand Up @@ -107,6 +109,8 @@ You can use the OpenAI python client to interact with the API. A sample applicat
```shell
pip install -U openai
python generate.py -m dall-e-3 -s "1024x256" -f new_logo.png "A banner style logo for the website of the OpenedAI Images Flux, an OpenAI API Image generator server which uses the Black Forest Labs FLUX.1 model."
# Or simply:
python generate.py "An astronaut in the jungle"
```

See the OpenAI Images Guide API and API Documentation for more ways to use the API.
Expand All @@ -119,54 +123,66 @@ There is a more detailed configuration guide in the [CONFIG.md](CONFIG.md).

## Pre-Configured models

> FP8 is the only available quantization so far, but more will come soon!
> There are other options available for low GPU VRAM support
Additional models are also available by default, there are option for all type of GPU setups.

* Only one model can be loaded at a time, and models are loaded on demand.

By default, the following models are configured (require ~40GB VRAM, bfloat16, <1s/step):
By default, the following models are configured (require ~40GB VRAM, bfloat16):

- `schnell`: `flux.1-schnell.json` FLUX.1 Schnell (official) (4 step, ~3s)
- `dev`: FLUX.1 Dev (without enhancement) (25/50 steps, ~15-30s)
- `merged`: `sayakpaul-flux.1-merged.json` Dev+Schnell merged, 12 steps
- `schnell`: `flux.1-schnell.json` FLUX.1 Schnell (official) (4 step)
- `dev`: FLUX.1 Dev (without enhancement) (25/50 steps)
- `merged`: `sayakpaul-flux.1-merged.json` Dev+Schnell merged, (12 steps)
- `dall-e-2` is set to use `shnell`
- `dall-e-3` is set to use `dev`, with prompt enhancement if an openai chat API is available.

Additional FP8 quantized models (require 24GB VRAM and can be slow to load, `+enable_vae_slicing`, `+enable_vae_tiling`, ~3+s/step):
Additional FP8 quantized models (require 24GB VRAM and can be slow to load, `+enable_vae_slicing`, `+enable_vae_tiling`):

- `schnell-fp8`: `kijai-flux.1-schnell-fp8.json` Scnhell with FP8 quantization, 4 steps (10-15s)
- `dev-fp8`: `kijai-flux.1-dev-fp8.json` Dev with FP8 quantization, 25/50 steps
- `merged-fp8`: `drbaph-flux.1-merged-fp8.json` Dev+Schnell merged, FP8 quantization, 12 steps by default
- `merged-fp8-4step`: `drbaph-flux.1-merged-fp8-4step.json` Dev+Schnell merged, FP8 quantization, 4 steps
- `schnell-fp8`: `kijai-flux.1-schnell-fp8.json` Scnhell with FP8 quantization (4 steps)
- `merged-fp8-4step`: `drbaph-flux.1-merged-fp8-4step.json` Dev+Schnell merged, FP8 quantization (4 steps)
- `merged-fp8`: `drbaph-flux.1-merged-fp8.json` Dev+Schnell merged, FP8 quantization (12 steps)
- `dev-fp8`: `kijai-flux.1-dev-fp8.json` Dev with FP8 quantization (25/50 steps)

Additional FP8 models (require 16GB VRAM and can be slow to load, `+enable_model_cpu_offload`, ~5+s/step):
Additional FP8 models (require 16GB VRAM and can be slow to load, `+enable_model_cpu_offload`):

- `schnell-fp8-16GB`: `kijai-flux.1-schnell-fp8-16GB.json` Scnhell, 4 steps (~15-30s)
- `dev-fp8-16GB`: `kijai-flux.1-dev-fp8-16GB.json` Dev with FP8 quantization, 25/50 steps
(slightly better)
- `merged-fp8-4step-16GB`: `drbaph-flux.1-merged-fp8-4step-16GB.json` Dev+Schnell merged, 4 steps
- `merged-fp8-16GB`: `drbaph-flux.1-merged-fp8-16GB.json` Dev+Schnell merged, 12 steps by default
- `schnell-fp8-16GB`: `kijai-flux.1-schnell-fp8-16GB.json` Scnhell (4 steps)
- `dev-fp8-16GB`: `kijai-flux.1-dev-fp8-16GB.json` Dev with FP8 quantization (25/50 steps)
- `merged-fp8-4step-16GB`: `drbaph-flux.1-merged-fp8-4step-16GB.json` Dev+Schnell merged (4 steps)
- `merged-fp8-16GB`: `drbaph-flux.1-merged-fp8-16GB.json` Dev+Schnell merged (12 steps)

Additional NF4 models (require 12GB VRAM):

- sayakpaul-dev-nf4-12GB: soon ...
- sayakpaul-dev-nf4-compile-12GB: soon ...

Low VRAM options (<4GB VRAM, 34GB RAM, `+enable_sequential_cpu_offload`, float16 instead of bfloat16, 8-15+s/step):
Low VRAM options (<4GB VRAM, 34GB RAM, `+enable_sequential_cpu_offload`, float16 instead of bfloat16):

- `schnell-low`: `flux.1-schnell-low.json` Schnell FP16, (30-60s per image)
- `dev-low`: `flux.1-dev-low.json` Dev FP16, at least a few minutes per image
- `merged-low`: `sayakpaul-flux.1-merged-low.json` Dev+Schnell FP16 merged, 12 steps by default
- `schnell-low`: `flux.1-schnell-low.json` Schnell FP16 (4 steps)
- `merged-low`: `sayakpaul-flux.1-merged-low.json` Dev+Schnell FP16 merged (12 steps)
- `dev-low`: `flux.1-dev-low.json` Dev FP16 (25/50 steps)

And more, check out the `config/lib` folder for more examples, including lora options.
There are `-compile` variants of many models as well. Be advised that the first couple images in a compiled model will be very slow to generate. The server must load, perhaps quantize and compile, and then the generation is dynamically optimized over the next couple generations, the first image may be 10 minutes or more to prepare. Most models can generate dozens of images in that time, so only use compiled models if you know what you're doing.

And more, including `int8` quants, check out the `config/lib` folder for more examples, including lora options.

> Timings are casually measured at 1024x1024 standard on an Nvidia A100 and may vary wildly from your system.
> \*) The name of the generator file is used to determine if a model is already loaded or not, if you edit a generator config in a way which requires reloading the model (such as changing `pipeline` or `options`), it wont reload it automatically. `config.json` and `generation_kwargs` will always be loaded each API call.
> Requesting an image generation with a special model called `unload` will unload the current model, freeing up it's RAM and VRAM resources.
## Performance

Performance plots for A100 (80GB) and 4090 (24GB), batch size = 1. Click Details to expand.
<details>

![alt text](processing_time_A100.png)

*) `dall-e-3` in this plot is `FLUX.1 Dev, enhanced`, not OpenAI `dall-e-3`

![alt text](processing_time_4090.png)

</details>


## Server Usage

Expand All @@ -191,7 +207,7 @@ options:

#### "The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens:"

* Long prompt encoding into CLIP is not yet supported (not working), all is not lost however, there are 2 encoders and the T5 encoder (text_encoder_2) supports up to 240 (?) tokens. No fix yet.
* Long prompt encoding into CLIP is not yet supported (not working), all is not lost however, there are 2 encoders and the T5 encoder (text_encoder_2) supports up to 256 tokens. No fix yet.

#### "that cleft chin woman", "everyone is too beautiful"

Expand All @@ -217,4 +233,7 @@ Additional Model formats and merges created by:

- OpenedAI Images FLux is released under the [GNU Affero General Public License v3.0](https://choosealicense.com/licenses/agpl-3.0/)
- [FLUX.1 \[dev\] Non-Commercial License.](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
- [FLUX.1 \[schnell\] is Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.](https://choosealicense.com/licenses/apache-2.0/)
- [FLUX.1 \[schnell\] is Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.](https://choosealicense.com/licenses/apache-2.0/)


![alt text](1725903030-dall-e-3-1536x1536-hd-1.png)
23 changes: 12 additions & 11 deletions full_tests.sh
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
#!/bin/sh

prompt="An astronaut in the jungle, blue-orange colour scheme"
CSV=perf.csv
G=$1 # A100 or other
folder="$1"
G="$2" # A100 or other
prompt="cool street art of an astronaut in a jungle, stencil and spray paint style, the flag says 'FLUX.1', on a door in a graffiti covered alley scene with used spray paint cans littering the ground, a sign by the door says 'Open' in faded, yellowed, black and white plastic, the secret door to a great club, the other writing says 'matatonic', 'flux.1 2024', 'black forest labs', 'openedai' and 'uNStaBLe' in unique styles"
CSV="$folder/perf.csv"

T40="schnell merged dev"
T40="dall-e-3 schnell merged dev"
T40C="schnell-compile merged-compile dev-compile"
T24="schnell-fp8 merged-fp8-4step merged-fp8 dev-fp8 dev-fp8-e5m2"
T24C="merged-fp8-4step-compile dev-fp8-e5m2-compile"
T16="schnell-fp8-16GB merged-fp8-4step-16GB merged-fp8-16GB dev-fp8-16GB dev-fp8-e5m2-16GB"
T24="schnell-fp8 merged-fp8-4step merged-fp8 dev-fp8"
T24C="schnell-fp8-compile merged-fp8-4step-compile merged-fp8-compile dev-fp8-compile"
T16="schnell-fp8-16GB merged-fp8-4step-16GB merged-fp8-16GB dev-fp8-16GB"
T12="" #"sayakpaul-dev-nf4-12GB"
T4="schnell-low merged-low dev-low"

if [ "$G" = "A100" ]; then
for i in $T40 ; do
./test_images.py -p -t test/$i/$G "$prompt" -n 1 -m $i --csv $CSV -T $G
./test_images.py -p -t $folder/$i/$G "$prompt" -n 1 -m "$i" --csv "$CSV" -T $G
done

for i in $T40C ; do
./test_images.py -p -t test/$i/$G "$prompt" -n 3 -m $i --csv $CSV -T $G
./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 3 -m "$i" --csv "$CSV" -T $G
done
fi

for i in $T24 $T16 $T12 $T4 ; do
./test_images.py -p -t test/$i/$G "$prompt" -n 1 -m $i --csv $CSV -T $G
./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 1 -m "$i" --csv "$CSV" -T $G
done

for i in $T24C ; do
./test_images.py -p -t test/$i/$G "$prompt" -n 3 -m $i --csv $CSV -T $G
./test_images.py -p -t "$folder/$i/$G" "$prompt" -n 3 -m "$i" --csv "$CSV" -T $G
done
86 changes: 46 additions & 40 deletions generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import openai
from PIL import Image


def parse_args(argv):
# prompt, model, size, filename
parser = argparse.ArgumentParser(description='Generate an image from a prompt using OpenAI\'s DALL-E API.')
Expand All @@ -22,54 +23,59 @@ def parse_args(argv):
parser.add_argument('-E', '--no-enhancement', action='store_true', help='Do not enhance the prompt.')
parser.add_argument('-S', '--no-show', action='store_true', help='Do not display the image.')
parser.add_argument('-V', '--no-save', action='store_true', help='Do not save the image, view only')
parser.add_argument('-B', '--bulk', action='store_true', help='Process prompts from file, one per line.')

return parser.parse_args(argv)

if __name__ == '__main__':
args = parse_args(sys.argv[1:])

if args.no_enhancement:
args.prompt = "I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:" + args.prompt

client = openai.Client(base_url='http://localhost:5005/v1', api_key='sk-ip')

def generation_round():

response = client.images.generate(
prompt=args.prompt,
response_format='b64_json',
model=args.model,
size=args.size,
n=int(args.batch),
quality=args.quality,
)
if args.bulk:
all_prompts = [ line.strip() for line in open(args.prompt, 'r') if len(line.strip()) > 0 and line.strip()[0] != '#']
else:
all_prompts = [ args.prompt ]

for prompt in all_prompts:
if args.no_enhancement:
prompt = "I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:" + prompt

for n, img in enumerate(response.data):
image = Image.open(io.BytesIO(base64.b64decode(img.b64_json)))
def generation_round():
response = client.images.generate(
prompt=prompt,
response_format='b64_json',
model=args.model,
size=args.size,
n=int(args.batch),
quality=args.quality,
)

if not args.no_save:
if args.filename:
filename = args.filename
if int(args.batch) > 1:
filename = f"{filename.split('.png')[0]}-{n}.png"
else:
f_args = dict(
short_prompt=args.prompt[:20],
prompt=args.prompt,
n=n,
model=args.model,
size=args.size,
quality=args.quality,
created=response.created,
)

filename = args.auto_name_format.format(**f_args).replace('/','_')

image.save(filename, format="PNG")
print(f'Saved: {filename}')

if not args.no_show:
image.show()
for n, img in enumerate(response.data):
if not args.no_save:
if args.filename:
filename = args.filename
if int(args.batch) > 1:
filename = f"{filename.split('.png')[0]}-{n}.png"
else:
f_args = dict(
short_prompt=args.prompt[:20],
prompt=args.prompt,
n=n,
model=args.model,
size=args.size,
quality=args.quality,
created=response.created,
)

filename = args.auto_name_format.format(**f_args).replace('/','_')

with open(filename, 'wb') as f:
f.write(base64.b64decode(img.b64_json))
print(f'Saved: {filename}')

if not args.no_show:
Image.open(filename).show()

for i in range(0, args.rounds):
generation_round()
for i in range(0, args.rounds):
generation_round()
38 changes: 26 additions & 12 deletions images.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@
import os
import sys
import time
from loguru import logger

import torch
from PIL import Image, PngImagePlugin
from diffusers import FluxTransformer2DModel, FluxPipeline
from loguru import logger
from pydantic import BaseModel
from transformers import T5EncoderModel, CLIPTextModel
import optimum.quanto

import uvicorn
from typing import Optional
from pydantic import BaseModel
import openai
import optimum.quanto
import torch
import uvicorn

import openedai

Expand Down Expand Up @@ -263,7 +263,7 @@ async def generate_images(pipe, **generation_kwargs) -> list:

generation_kwargs['generator'] = torch.Generator("cpu").manual_seed(seed)

return pipe(**generation_kwargs).images
return pipe(**generation_kwargs).images, seed


async def enhance_prompt(prompt: str, **enhancer) -> str:
Expand All @@ -286,7 +286,7 @@ async def enhance_prompt(prompt: str, **enhancer) -> str:
@app.post("/v1/images/generations")
async def generations(request: GenerationsRequest):
resp = {
'created': int(time.time()),
'created': int(time.time() * 1000),
'data': []
}

Expand Down Expand Up @@ -315,17 +315,31 @@ async def generations(request: GenerationsRequest):

try:
pipe = await ready_model(generator_name, model_config)
images = await generate_images(pipe, **generation_kwargs)
images, seed = await generate_images(pipe, **generation_kwargs)

if images:
for img in images:
# TODO: cache images, add get method for cache fetch
def make_pngmetadata():
# not sure how flux does it, but this is how SD did it.
# a closeup portrait of a playful maid, undercut hair, apron, amazing body, pronounced feminine feature, busty, kitchen, [ash blonde | ginger | pink hair], freckles, flirting with camera.Negative prompt: (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation. tattoo.
# Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 6.5, Seed: 1804518985, Size: 768x1024, Model hash: 9aba26abdf, Model: Deliberate, ENSD: 31337
k = generation_kwargs
parameters = f"{k['prompt']}{'.' if k['prompt'][-1] != '.' else ''}Steps: {k['num_inference_steps']}, Sampler: Euler, CFG Scale: {k['guidance_scale']}, Seed: {seed}, Size: {k['width']}x{k['height']}, Model: {request.model}" # batch?
pngmetadata = PngImagePlugin.PngInfo()
pngmetadata.add_text('Parameters', parameters)
return pngmetadata

pnginfo = make_pngmetadata()

if args.log_level == 'DEBUG':
img.save("config/debug.png")
img.save("config/debug.png", pnginfo=pnginfo)

img_bytes = io.BytesIO()
img.save(img_bytes, format='PNG')
img.save(img_bytes, format='PNG', pnginfo=pnginfo)
b64_json = base64.b64encode(img_bytes.getvalue()).decode('utf-8')
img_bytes.close()


if request.response_format == 'b64_json':
img_dat = {'b64_json': b64_json}
else:
Expand Down
Loading

0 comments on commit e564df5

Please sign in to comment.