Skip to content

Latest commit

 

History

History
56 lines (38 loc) · 2.41 KB

inference.md

File metadata and controls

56 lines (38 loc) · 2.41 KB

Overview

We generate step-by-step since each phase loads billion-scale model. Note:

  • LOGLEVEL=(ERROR|WARNING|INFO|DEBUG) will control the log level.

Instruction

K-shot LLM

We use LLMs for chat and do in-context learning by retrieving a few samples, instead of a fine-tuned DesignLLM in COLE.

poetry run python -m opencole.inference_launcher.k_shot_llm --input_hfds <INPUT_HFDS> --output_dir <OUTPUT_DIR>

If you want to test your own set of inputs, please make a csv file as follows. Each row should contain Intension (make sure to use double quatation) and id (just for identifying each sample). Please use --input_intention_csv <INPUT_INTENTION_CSV> to pass it.

id,Intension
00001,"Design an animated digital poster for a fantasy concert where Joe Hisaishi performs on a grand piano on a moonlit beach. The scene should be magical, with bioluminescent waves and a starry sky. The poster should invite viewers to an exclusive virtual event."

Note: if you have access to cpu, installing faiss-cpu might enable faster ANN search.

Text-to-image

We directly generate a single image from text, unlike the original COLE ((i) text -> background + (ii) (text, background) -> foreground).

poetry run python -m opencole.inference_launcher.text_to_image --pretrained_model_name_or_path <PRETRAINED_MODEL_NAME_OR_PATH> --detail_dir <DETAIL_DIR> --output_dir <OUTPUT_DIR>

Note: if you only have UNet weights, which happens in fine-tuning SDXL on your own, please use --finetuned_unet_dir <UNET_DIR> --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0.

TypographyLMM

poetry run python -m opencole.inference_launcher.typographylmm --pretrained_model_name_or_path <PRETRAINED_MODEL_NAME_OR_PATH> --image_dir <IMAGE_DIR> --detail_dir <DETAIL_DIR> --output_dir <OUTPUT_DIR>

Rendering

poetry run python -m opencole.inference_launcher.render --image_dir <IMAGE_DIR> --typography_dir <TYPOGRAPHY_DIR> --output_dir <OUTPUT_DIR>

Other Baselines

As the original COLE tested, text-to-image models with GPT-4 augmented prompts are strong baselines. Three models (two are from HF and one is API(AzureOpenAI)) can be used by just switching --tester.

poetry run python -m opencole.inference_launcher.gpt4_augmented_baselines --tester (sdxl|sd3|deepfloyd|dalle3|flux1|auraflow|pixartsigma) --output_dir <OUTPUT_DIR>