diff --git a/latent.ipynb b/latent.ipynb index 1cd3038..7e56ba8 100644 --- a/latent.ipynb +++ b/latent.ipynb @@ -377,11 +377,11 @@ " print(latent_diffusion_model)\n", " model = instantiate_from_config(config.model)\n", " if(latent_diffusion_model != \"finetuned\"):\n", - " sd = torch.load(ckpt, map_location=\"cuda\")[\"state_dict\"]\n", + " sd = torch.load(ckpt, map_location=device)[\"state_dict\"]\n", " m, u = model.load_state_dict(sd, strict = False)\n", " \n", " if(latent_diffusion_model == \"finetuned\"): \n", - " sd = torch.load(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\",map_location=\"cuda\")\n", + " sd = torch.load(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\",map_location=device)\n", " m, u = model.load_state_dict(sd, strict = False)\n", " #model.model = model.model.half().eval().to(device)\n", " \n", @@ -410,7 +410,7 @@ " print(\"unexpected keys:\")\n", " print(u)\n", "\n", - " model.requires_grad_(False).half().eval().to('cuda')\n", + " model.requires_grad_(False).half().eval().to(device)\n", " return model\n", "\n", "config = OmegaConf.load(\"./latent-diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml\") # TODO: Optionally download from same location as ckpt and chnage this logic\n", diff --git a/latent.ipynb.bak b/latent.ipynb.bak new file mode 100644 index 0000000..1cd3038 --- /dev/null +++ b/latent.ipynb.bak @@ -0,0 +1,1695 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "view-in-github" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NUmmV5ZvrPbP" + }, + "source": [ + "# Latent Majesty Diffusion v1.6\n", + "#### Formerly known as Princess Generator\n", + "##### Access our [Majestic Guide](https://multimodal.art/majesty-diffusion) (_under construction_), our [GitHub](https://github.com/multimodalart/majesty-diffusion), join our community on [Discord](https://discord.gg/yNBtQBEDfZ) or reach out via [@multimodalart on Twitter](https://twitter.com/multimodalart))\n", + "\\\n", + " \n", + "---\n", + "\\\n", + "\n", + "\n", + "#### CLIP Guided Latent Diffusion by [dango233](https://github.com/Dango233/) and [apolinario (@multimodalart)](https://twitter.com/multimodalart). \n", + "The LAION-400M-trained model and the modified inference code are from [CompVis Latent Diffusion](https://github.com/CompVis/latent-diffusion). The guided-diffusion method is modified by Dango233 based on [Katherine Crowson](https://twitter.com/RiversHaveWings)'s guided diffusion notebook. multimodalart savable settings, MMC and assembled the Colab. Check the complete list on our GitHub. Some functions and methods are from various code masters (nsheppard, DanielRussRuss and others)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WOAs3ZvLlktt" + }, + "source": [ + "## Changelog\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "p15Fm1AjloLa" + }, + "outputs": [], + "source": [ + "#@markdown Release: 1.2 (prior versions were Princess Generator and you can check [GitHub out for that](https://github.com/multimodalart/majesty-diffusion/))\n", + "\n", + "#@markdown Changelog: 1.3 - better upscaler (learn how to use it on our [Majestic Guide](https://multimodal.art/majesty-diffusion))\n", + "\n", + "#@markdown Changelog: 1.4 - better defaults, added OpenCLIP ViT-L/14 LAION-400M, fix CLOOB, adds modified dynamic thresholding, removes latent upscaler (was broken), adds RGB upscaler \n", + "\n", + "#@markdown Changelog 1.5 - even better defaults, better dynamic thresholidng, fixes range scale, adds var and mean scales, adds the possibility of blurring cuts\n", + "\n", + "#@markdown Changelog 1.6 - ViT-L conditioning for latenet diffusion, adds noising and scaling during advanced scheduling phases, fixes linear ETA, adss LAION models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uWLsDt7wkZfU" + }, + "source": [ + "## Save model and outputs on Google Drive? " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "aJF6wP2zkWE_" + }, + "outputs": [], + "source": [ + "#@markdown Enable saving outputs to Google Drive to save your creations at AI/models\n", + "save_outputs_to_google_drive = True #@param {type:\"boolean\"}\n", + "#@markdown Enable saving models to Google Drive to avoid downloading the model every Colab instance\n", + "save_models_to_google_drive = True #@param {type:\"boolean\"}\n", + "\n", + "if save_outputs_to_google_drive or save_models_to_google_drive:\n", + " from google.colab import drive\n", + " try:\n", + " drive.mount('/content/gdrive')\n", + " except:\n", + " save_outputs_to_google_drive = False\n", + " save_models_to_google_drive = False\n", + "\n", + "model_path = \"/content/gdrive/MyDrive/AI/models\" if save_models_to_google_drive else \"/content/\"\n", + "outputs_path = \"/content/gdrive/MyDrive/AI/latent_majesty_diffusion\" if save_outputs_to_google_drive else \"/content/outputs\"\n", + "!mkdir -p $model_path\n", + "!mkdir -p $outputs_path\n", + "print(f\"Model will be stored at {model_path}\")\n", + "print(f\"Outputs will be saved to {outputs_path}\")\n", + "\n", + "#If you want to run it locally change it to true\n", + "is_local = False\n", + "skip_installs = False\n", + "if(is_local):\n", + " model_path = \"/choose/your/local/model/path\"\n", + " outputs_path = \"/choose/your/local/outputs/path\"\n", + " skip_installs = True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "5Fxt-5TaYBs2" + }, + "outputs": [], + "source": [ + "#@title Model settings\n", + "#@markdown The `original` model is the model trained by CompVis in the LAION-400M dataset\n", + "#@markdown
The `finetuned` model is a finetune of the `original` model [by Jack000](https://github.com/Jack000/glid-3-xl) that generates less watermarks, but is a bit worse in text synthesis. Colab Free does not have enough run for the finetuned (for now)\n", + "#@markdown
The `ongo` and `erlich` models are models [fine-tuned by LAION](https://github.com/LAION-AI/ldm-finetune)on art (ongo) and erlich (logos) \n", + "latent_diffusion_model = 'finetuned' #@param [\"original\", \"finetuned\", \"ongo (fine tuned in paintings)\", \"erlich (fine tuned in logos)\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xEVSOJ4f0B21" + }, + "source": [ + "# Setup stuff" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "NHgUAp48qwoG" + }, + "outputs": [], + "source": [ + "#@title Installation\n", + "if(not skip_installs):\n", + " import subprocess\n", + " nvidiasmi_output = subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", + " cards_requiring_downgrade = [\"Tesla T4\", \"V100\"]\n", + " #if any(cardstr in nvidiasmi_output for cardstr in cards_requiring_downgrade):\n", + " # downgrade_pytorch_result = subprocess.run(['pip', 'install', 'torch==1.10.2', 'torchvision==0.11.3', '-q'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", + " import sys\n", + " sys.path.append(\".\")\n", + " !git clone https://github.com/multimodalart/latent-diffusion --branch 1.6\n", + " !git clone https://github.com/CompVis/taming-transformers\n", + " !git clone https://github.com/TencentARC/GFPGAN\n", + " !git lfs clone https://huggingface.co/datasets/multimodalart/latent-majesty-diffusion-settings\n", + " !git lfs clone https://github.com/LAION-AI/aesthetic-predictor\n", + " !pip install -e ./taming-transformers\n", + " !pip install omegaconf>=2.0.0 pytorch-lightning>=1.0.8 torch-fidelity einops\n", + " !pip install transformers\n", + " !pip install dotmap\n", + " !pip install resize-right\n", + " !pip install piq\n", + " !pip install lpips\n", + " !pip install basicsr\n", + " !pip install facexlib\n", + " !pip install realesrgan\n", + "\n", + " sys.path.append('./taming-transformers')\n", + " from taming.models import vqgan\n", + " from subprocess import Popen, PIPE\n", + " try:\n", + " import mmc\n", + " except:\n", + " # install mmc\n", + " !git clone https://github.com/apolinario/Multi-Modal-Comparators --branch gradient_checkpointing\n", + " !pip install poetry\n", + " !cd Multi-Modal-Comparators; poetry build\n", + " !cd Multi-Modal-Comparators; pip install dist/mmc*.whl\n", + " \n", + " # optional final step:\n", + " #poe napm_installs\n", + " !python Multi-Modal-Comparators/src/mmc/napm_installs/__init__.py\n", + " # suppress mmc warmup outputs\n", + " import mmc.loaders" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fNqCqQDoyZmq" + }, + "source": [ + "Now, download the checkpoint (~5.7 GB). This will usually take 3-6 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "cNHvQBhzyXCI" + }, + "outputs": [], + "source": [ + "#@title Download models\n", + "import os\n", + "if os.path.isfile(f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\"):\n", + " print(\"Using Latent Diffusion model saved from Google Drive\")\n", + "else: \n", + " !wget -O $model_path/latent_diffusion_txt2img_f8_large.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt --no-check-certificate\n", + "\n", + "if os.path.isfile(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\"):\n", + " print(\"Using Latent Diffusion finetuned model saved from Google Drive\")\n", + "else: \n", + " !wget -O $model_path/txt2img-f8-large-jack000-finetuned-fp16.ckpt https://huggingface.co/multimodalart/compvis-latent-diffusion-text2img-large/resolve/main/txt2img-f8-large-jack000-finetuned-fp16.ckpt --no-check-certificate\n", + "\n", + "if(latent_diffusion_model == 'ongo (fine tuned in art)'):\n", + " if os.path.isfile(f\"{model_path}/ongo.pt\"):\n", + " print(\"Using ongo model saved from Google Drive\")\n", + " else:\n", + " !wget -O $model_path/ongo.pt https://huggingface.co/laion/ongo/resolve/main/ongo.pt\n", + "\n", + "if(latent_diffusion_model == 'erlich (fine tuned in logos)'):\n", + " if os.path.isfile(f\"{model_path}/erlich.pt\"):\n", + " print(\"Using ongo model saved from Google Drive\")\n", + " else:\n", + " !wget -O $model_path/erlich.pt https://huggingface.co/laion/erlich/resolve/main/model/ema_0.9999_120000.pt\n", + "\n", + "if os.path.isfile(f\"{model_path}/ava_vit_l_14_336_linear.pth\"):\n", + " print(\"Using ViT-L/14@336px aesthetic model from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/ava_vit_l_14_336_linear.pth https://multimodal.art/models/ava_vit_l_14_336_linear.pth\n", + "\n", + "if os.path.isfile(f\"{model_path}/sa_0_4_vit_l_14_linear.pth\"):\n", + " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/sa_0_4_vit_l_14_linear.pth https://multimodal.art/models/sa_0_4_vit_l_14_linear.pth\n", + "\n", + "if os.path.isfile(f\"{model_path}/ava_vit_l_14_linear.pth\"):\n", + " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/ava_vit_l_14_linear.pth https://multimodal.art/models/ava_vit_l_14_linear.pth\n", + "\n", + "if os.path.isfile(f\"{model_path}/ava_vit_b_16_linear.pth\"):\n", + " print(\"Using ViT-B/16 aesthetic model from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/ava_vit_b_16_linear.pth http://batbot.tv/ai/models/v-diffusion/ava_vit_b_16_linear.pth\n", + "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_16_linear.pth\"):\n", + " print(\"Using ViT-B/16 sa aesthetic model already saved\")\n", + "else:\n", + " !wget -O $model_path/sa_0_4_vit_b_16_linear.pth https://multimodal.art/models/sa_0_4_vit_b_16_linear.pth\n", + "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"):\n", + " print(\"Using ViT-B/32 aesthetic model from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/sa_0_4_vit_b_32_linear.pth https://multimodal.art/models/sa_0_4_vit_b_32_linear.pth\n", + "if os.path.isfile(f\"{model_path}/openimages_512x_png_embed224.npz\"):\n", + " print(\"Using openimages png from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/openimages_512x_png_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/openimages_512x_png_embed224.npz\n", + "if os.path.isfile(f\"{model_path}/imagenet_512x_jpg_embed224.npz\"):\n", + " print(\"Using imagenet antijpeg from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/imagenet_512x_jpg_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/imagenet_512x_jpg_embed224.npz\n", + "if os.path.isfile(f\"{model_path}/GFPGANv1.3.pth\"):\n", + " print(\"Using GFPGAN v1.3 from Google Drive\")\n", + "else:\n", + " !wget -O $model_path/GFPGANv1.3.pth https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth\n", + "!cp $model_path/GFPGANv1.3.pth GFPGAN/experiments/pretrained_models/GFPGANv1.3.pth\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ThxmCePqt1mt" + }, + "source": [ + "Let's also check what type of GPU we've got." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jbL2zJ7Pt7Jl" + }, + "outputs": [], + "source": [ + "!nvidia-smi" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "BPnyd-XUKbfE" + }, + "outputs": [], + "source": [ + "#@title Import stuff\n", + "import argparse, os, sys, glob\n", + "import torch\n", + "import numpy as np\n", + "from omegaconf import OmegaConf\n", + "from PIL import Image\n", + "from tqdm.auto import tqdm, trange\n", + "tqdm_auto_model = __import__(\"tqdm.auto\", fromlist=[None]) \n", + "sys.modules['tqdm'] = tqdm_auto_model\n", + "from einops import rearrange\n", + "from torchvision.utils import make_grid\n", + "import transformers\n", + "import gc\n", + "sys.path.append('./latent-diffusion')\n", + "from ldm.util import instantiate_from_config\n", + "from ldm.models.diffusion.ddim import DDIMSampler\n", + "from ldm.models.diffusion.plms import PLMSSampler\n", + "from ldm.modules.diffusionmodules.util import noise_like, make_ddim_sampling_parameters\n", + "import tensorflow as tf\n", + "from dotmap import DotMap\n", + "import ipywidgets as widgets\n", + "from math import pi\n", + "\n", + "from subprocess import Popen, PIPE\n", + "\n", + "from dataclasses import dataclass\n", + "from functools import partial\n", + "import gc\n", + "import io\n", + "import math\n", + "import sys\n", + "import random\n", + "from piq import brisque\n", + "from itertools import product\n", + "from IPython import display\n", + "import lpips\n", + "from PIL import Image, ImageOps\n", + "import requests\n", + "import torch\n", + "from torch import nn\n", + "from torch.nn import functional as F\n", + "from torchvision import models\n", + "from torchvision import transforms\n", + "from torchvision import transforms as T\n", + "from torchvision.transforms import functional as TF\n", + "from numpy import nan\n", + "from threading import Thread\n", + "import time\n", + "import re\n", + "import base64\n", + "\n", + "#sys.path.append('../CLIP')\n", + "#Resizeright for better gradient when resizing\n", + "#sys.path.append('../ResizeRight/')\n", + "#sys.path.append('../cloob-training/')\n", + "\n", + "from resize_right import resize\n", + "\n", + "import clip\n", + "#from cloob_training import model_pt, pretrained\n", + "\n", + "#pretrained.list_configs()\n", + "from torch.utils.tensorboard import SummaryWriter\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "twG4nxYCrI8F" + }, + "outputs": [], + "source": [ + "#@title Load the model\n", + "torch.backends.cudnn.benchmark = True\n", + "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", + "def load_model_from_config(config, ckpt, verbose=False, latent_diffusion_model=\"original\"):\n", + " print(f\"Loading model from {ckpt}\")\n", + " print(latent_diffusion_model)\n", + " model = instantiate_from_config(config.model)\n", + " if(latent_diffusion_model != \"finetuned\"):\n", + " sd = torch.load(ckpt, map_location=\"cuda\")[\"state_dict\"]\n", + " m, u = model.load_state_dict(sd, strict = False)\n", + " \n", + " if(latent_diffusion_model == \"finetuned\"): \n", + " sd = torch.load(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\",map_location=\"cuda\")\n", + " m, u = model.load_state_dict(sd, strict = False)\n", + " #model.model = model.model.half().eval().to(device)\n", + " \n", + " if(latent_diffusion_model == \"ongo (fine tuned in art)\"):\n", + " del sd \n", + " sd_finetuned = torch.load(f\"{model_path}/ongo.pt\")\n", + " sd_finetuned[\"input_blocks.0.0.weight\"] = sd_finetuned[\"input_blocks.0.0.weight\"][:,0:4,:,:]\n", + " model.model.diffusion_model.load_state_dict(sd_finetuned, strict=False)\n", + " del sd_finetuned\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + "\n", + " if(latent_diffusion_model == \"erlich (fine tuned in logos)\"):\n", + " del sd \n", + " sd_finetuned = torch.load(f\"{model_path}/erlich.pt\")\n", + " sd_finetuned[\"input_blocks.0.0.weight\"] = sd_finetuned[\"input_blocks.0.0.weight\"][:,0:4,:,:]\n", + " model.model.diffusion_model.load_state_dict(sd_finetuned, strict=False)\n", + " del sd_finetuned\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + "\n", + " if len(m) > 0 and verbose:\n", + " print(\"missing keys:\")\n", + " print(m)\n", + " if len(u) > 0 and verbose:\n", + " print(\"unexpected keys:\")\n", + " print(u)\n", + "\n", + " model.requires_grad_(False).half().eval().to('cuda')\n", + " return model\n", + "\n", + "config = OmegaConf.load(\"./latent-diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml\") # TODO: Optionally download from same location as ckpt and chnage this logic\n", + "model = load_model_from_config(config, f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\",False, latent_diffusion_model) # TODO: check path\n", + "model = model.half().eval().to(device)\n", + "#if(latent_diffusion_model == \"finetuned\"):\n", + "# model.model = model.model.half().eval().to(device)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "HY_7vvnPThzS" + }, + "outputs": [], + "source": [ + "#@title Load necessary functions\n", + "def set_custom_schedules(schedule):\n", + " custom_schedules = []\n", + " for schedule_item in schedule:\n", + " if(isinstance(schedule_item,list)):\n", + " custom_schedules.append(np.arange(*schedule_item))\n", + " else:\n", + " custom_schedules.append(schedule_item)\n", + " \n", + " return custom_schedules\n", + "\n", + "def parse_prompt(prompt):\n", + " if prompt.startswith('http://') or prompt.startswith('https://') or prompt.startswith(\"E:\") or prompt.startswith(\"C:\") or prompt.startswith(\"D:\"):\n", + " vals = prompt.rsplit(':', 2)\n", + " vals = [vals[0] + ':' + vals[1], *vals[2:]]\n", + " else:\n", + " vals = prompt.rsplit(':', 1)\n", + " vals = vals + ['', '1'][len(vals):]\n", + " return vals[0], float(vals[1])\n", + "\n", + "class MakeCutouts(nn.Module):\n", + " def __init__(self, cut_size,\n", + " Overview=4, \n", + " WholeCrop = 0, WC_Allowance = 10, WC_Grey_P=0.2,\n", + " InnerCrop = 0, IC_Size_Pow=0.5, IC_Grey_P = 0.2,\n", + " cut_blur_n = 0\n", + " ):\n", + " super().__init__()\n", + " self.cut_size = cut_size\n", + " self.Overview = Overview\n", + " self.WholeCrop= WholeCrop\n", + " self.WC_Allowance = WC_Allowance\n", + " self.WC_Grey_P = WC_Grey_P\n", + " self.InnerCrop = InnerCrop\n", + " self.IC_Size_Pow = IC_Size_Pow\n", + " self.IC_Grey_P = IC_Grey_P\n", + " self.cut_blur_n = cut_blur_n\n", + " self.augs = T.Compose([\n", + " #T.RandomHorizontalFlip(p=0.5),\n", + " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", + " T.RandomAffine(degrees=0, \n", + " translate=(0.05, 0.05), \n", + " #scale=(0.9,0.95),\n", + " fill=-1, interpolation = T.InterpolationMode.BILINEAR, ),\n", + " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", + " #T.RandomPerspective(p=1, interpolation = T.InterpolationMode.BILINEAR, fill=-1,distortion_scale=0.2),\n", + " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", + " T.RandomGrayscale(p=0.1),\n", + " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", + " T.ColorJitter(brightness=0.05, contrast=0.05, saturation=0.05),\n", + " ])\n", + "\n", + " def forward(self, input):\n", + " gray = transforms.Grayscale(3)\n", + " sideY, sideX = input.shape[2:4]\n", + " max_size = min(sideX, sideY)\n", + " min_size = min(sideX, sideY, self.cut_size)\n", + " l_size = max(sideX, sideY)\n", + " output_shape = [input.shape[0],3,self.cut_size,self.cut_size] \n", + " output_shape_2 = [input.shape[0],3,self.cut_size+2,self.cut_size+2]\n", + " pad_input = F.pad(input,((sideY-max_size)//2+round(max_size*0.055),(sideY-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055)), **padargs)\n", + " cutouts_list = []\n", + " \n", + " if self.Overview>0:\n", + " cutouts = []\n", + " cutout = resize(pad_input, out_shape=output_shape, antialiasing=True)\n", + " output_shape_all = list(output_shape)\n", + " output_shape_all[0]=self.Overview*input.shape[0]\n", + " pad_input = pad_input.repeat(input.shape[0],1,1,1)\n", + " cutout = resize(pad_input, out_shape=output_shape_all)\n", + " if aug: cutout=self.augs(cutout)\n", + " if self.cut_blur_n > 0: cutout[0:self.cut_blur_n,:,:,:] = TF.gaussian_blur(cutout[0:self.cut_blur_n,:,:,:],cut_blur_kernel)\n", + " cutouts_list.append(cutout)\n", + " \n", + " if self.InnerCrop >0:\n", + " cutouts=[]\n", + " for i in range(self.InnerCrop):\n", + " size = int(torch.rand([])**self.IC_Size_Pow * (max_size - min_size) + min_size)\n", + " offsetx = torch.randint(0, sideX - size + 1, ())\n", + " offsety = torch.randint(0, sideY - size + 1, ())\n", + " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n", + " if i <= int(self.IC_Grey_P * self.InnerCrop):\n", + " cutout = gray(cutout)\n", + " cutout = resize(cutout, out_shape=output_shape)\n", + " cutouts.append(cutout)\n", + " if cutout_debug:\n", + " TF.to_pil_image(cutouts[-1].add(1).div(2).clamp(0, 1).squeeze(0)).save(\"content/diff/cutouts/cutout_InnerCrop.jpg\",quality=99)\n", + " cutouts_tensor = torch.cat(cutouts)\n", + " cutouts=[]\n", + " cutouts_list.append(cutouts_tensor)\n", + " cutouts=torch.cat(cutouts_list)\n", + " return cutouts\n", + "\n", + "def spherical_dist_loss(x, y):\n", + " x = F.normalize(x, dim=-1)\n", + " y = F.normalize(y, dim=-1)\n", + " return (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2)\n", + "\n", + "def tv_loss(input):\n", + " \"\"\"L2 total variation loss, as in Mahendran et al.\"\"\"\n", + " input = F.pad(input, (0, 1, 0, 1), 'replicate')\n", + " x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]\n", + " y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]\n", + " return (x_diff**2 + y_diff**2).mean([1, 2, 3])\n", + "\n", + "#def range_loss(input, range_min, range_max):\n", + "# return ((input - input.clamp(range_min,range_max)).abs()*10).pow(2).mean([1, 2, 3])\n", + "def range_loss(input, range_min, range_max):\n", + " return ((input - input.clamp(range_min,range_max)).abs()).mean([1, 2, 3])\n", + "\n", + "\n", + "def symmetric_loss(x):\n", + " w = x.shape[3]\n", + " diff = (x - torch.flip(x,[3])).square().mean().sqrt()/(x.shape[2]*x.shape[3]/1e4)\n", + " return(diff)\n", + "\n", + "def fetch(url_or_path):\n", + " \"\"\"Fetches a file from an HTTP or HTTPS url, or opens the local file.\"\"\"\n", + " if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n", + " r = requests.get(url_or_path)\n", + " r.raise_for_status()\n", + " fd = io.BytesIO()\n", + " fd.write(r.content)\n", + " fd.seek(0)\n", + " return fd\n", + " return open(url_or_path, 'rb')\n", + "\n", + "\n", + "def to_pil_image(x):\n", + " \"\"\"Converts from a tensor to a PIL image.\"\"\"\n", + " if x.ndim == 4:\n", + " assert x.shape[0] == 1\n", + " x = x[0]\n", + " if x.shape[0] == 1:\n", + " x = x[0]\n", + " return TF.to_pil_image((x.clamp(-1, 1) + 1) / 2)\n", + "\n", + "def base64_to_image(base64_str, image_path=None):\n", + " base64_data = re.sub('^data:image/.+;base64,', '', base64_str)\n", + " binary_data = base64.b64decode(base64_data)\n", + " img_data = io.BytesIO(binary_data)\n", + " img = Image.open(img_data)\n", + " if image_path:\n", + " img.save(image_path)\n", + " return img\n", + "\n", + "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", + " std=[0.26862954, 0.26130258, 0.27577711])\n", + "\n", + "def centralized_grad(x, use_gc=True, gc_conv_only=False):\n", + " if use_gc:\n", + " if gc_conv_only:\n", + " if len(list(x.size())) > 3:\n", + " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", + " else:\n", + " if len(list(x.size())) > 1:\n", + " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", + " return x\n", + "\n", + "def cond_fn(x, t):\n", + " global cur_step\n", + " cur_step += 1\n", + " t=1000-t\n", + " t=t[0]\n", + " x = x.detach()\n", + " with torch.enable_grad():\n", + " global clamp_start_, clamp_max \n", + " x = x.requires_grad_()\n", + " x_in = model.decode_first_stage(x)\n", + " display_handler(x_in,t,1,False)\n", + " n = x_in.shape[0]\n", + " clip_guidance_scale = clip_guidance_index[t]\n", + " make_cutouts = {}\n", + " #rx_in_grad = torch.zeros_like(x_in)\n", + " for i in clip_list:\n", + " make_cutouts[i] = MakeCutouts(clip_size[i][0] if type(clip_size[i]) is tuple else clip_size[i],\n", + " Overview= cut_overview[t], \n", + " InnerCrop = cut_innercut[t], \n", + " IC_Size_Pow=cut_ic_pow, IC_Grey_P = cut_icgray_p[t],\n", + " cut_blur_n = cut_blur_n[t]\n", + " )\n", + " cutn = cut_overview[t]+cut_innercut[t]\n", + " for j in range(cutn_batches):\n", + " losses=0\n", + " for i in clip_list:\n", + " clip_in = clip_normalize[i](make_cutouts[i](x_in.add(1).div(2)).to(\"cuda\"))\n", + " image_embeds = clip_model[i].encode_image(clip_in).float().unsqueeze(0).expand([target_embeds[i].shape[0],-1,-1])\n", + " target_embeds_temp = target_embeds[i]\n", + " if i == 'ViT-B-32--openai' and experimental_aesthetic_embeddings:\n", + " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_b_32_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", + " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", + " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", + " if i == 'ViT-L-14--openai' and experimental_aesthetic_embeddings:\n", + " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_l_14_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", + " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", + " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", + " target_embeds_temp = target_embeds_temp.unsqueeze(1).expand([-1,cutn*n,-1]) \n", + " dists = spherical_dist_loss(image_embeds, target_embeds_temp)\n", + " dists = dists.mean(1).mul(weights[i].squeeze()).mean()\n", + " losses+=dists*clip_guidance_scale #* (2 if i in [\"ViT-L-14-336--openai\", \"RN50x64--openai\", \"ViT-B-32--laion2b_e16\"] else (.4 if \"cloob\" in i else 1))\n", + " if i == \"ViT-L-14-336--openai\" and aes_scale !=0:\n", + " aes_loss = (aesthetic_model_336(F.normalize(image_embeds, dim=-1))).mean() \n", + " losses -= aes_loss * aes_scale \n", + " if i == \"ViT-L-14--openai\" and aes_scale !=0:\n", + " aes_loss = (aesthetic_model_224(F.normalize(image_embeds, dim=-1))).mean() \n", + " losses -= aes_loss * aes_scale \n", + " if i == \"ViT-B-16--openai\" and aes_scale !=0:\n", + " aes_loss = (aesthetic_model_16(F.normalize(image_embeds, dim=-1))).mean() \n", + " losses -= aes_loss * aes_scale \n", + " if i == \"ViT-B-32--openai\" and aes_scale !=0:\n", + " aes_loss = (aesthetic_model_32(F.normalize(image_embeds, dim=-1))).mean()\n", + " losses -= aes_loss * aes_scale\n", + " #x_in_grad += torch.autograd.grad(losses, x_in)[0] / cutn_batches / len(clip_list)\n", + " #losses += dists\n", + " #losses = losses / len(clip_list) \n", + " #gc.collect()\n", + " \n", + " loss = losses\n", + " #del losses\n", + " if symmetric_loss_scale != 0: loss += symmetric_loss(x_in) * symmetric_loss_scale\n", + " if init_image is not None and init_scale:\n", + " lpips_loss = (lpips_model(x_in, init) * init_scale).squeeze().mean()\n", + " #print(lpips_loss)\n", + " loss += lpips_loss\n", + " range_scale= range_index[t]\n", + " range_losses = range_loss(x_in,RGB_min,RGB_max).sum() * range_scale\n", + " loss += range_losses\n", + " #loss_grad = torch.autograd.grad(loss, x_in, )[0]\n", + " #x_in_grad += loss_grad\n", + " #grad = -torch.autograd.grad(x_in, x, x_in_grad)[0]\n", + " loss.backward()\n", + " grad = -x.grad\n", + " grad = torch.nan_to_num(grad, nan=0.0, posinf=0, neginf=0)\n", + " if grad_center: grad = centralized_grad(grad, use_gc=True, gc_conv_only=False)\n", + " mag = grad.square().mean().sqrt()\n", + " if mag==0 or torch.isnan(mag):\n", + " print(\"ERROR\")\n", + " print(t)\n", + " return(grad)\n", + " if t>=0:\n", + " if active_function == \"softsign\":\n", + " grad = F.softsign(grad*grad_scale/mag)\n", + " if active_function == \"tanh\":\n", + " grad = (grad/mag*grad_scale).tanh()\n", + " if active_function==\"clamp\":\n", + " grad = grad.clamp(-mag*grad_scale*2,mag*grad_scale*2)\n", + " if grad.abs().max()>0:\n", + " grad=grad/grad.abs().max()*opt.mag_mul\n", + " magnitude = grad.square().mean().sqrt()\n", + " else:\n", + " return(grad)\n", + " clamp_max = clamp_index_variation[t]\n", + " #print(magnitude, end = \"\\r\")\n", + " grad = grad* magnitude.clamp(max= clamp_max) /magnitude#0.2\n", + " grad = grad.detach()\n", + " grad = grad_fn(grad,t)\n", + " x = x.detach()\n", + " x = x.requires_grad_()\n", + " var = x.var()\n", + " var_scale = var_index[t]\n", + " var_losses = (var.pow(2).clamp(min = var_range)- 1) * var_scale \n", + " mean_scale = mean_index[t]\n", + " mean_losses = (x.mean().abs() - mean_range).abs().clamp(min = 0)*mean_scale\n", + " tv_losses = tv_loss(x).sum() * tv_scales[0] +\\\n", + " tv_loss(F.interpolate(x, scale_factor= 1/2)).sum()* tv_scales[1] + \\\n", + " tv_loss(F.interpolate(x, scale_factor = 1/4)).sum()* tv_scales[2] + \\\n", + " tv_loss(F.interpolate(x, scale_factor = 1/8)).sum()* tv_scales[3] \n", + " adjust_losses = tv_losses + var_losses + mean_losses\n", + " adjust_losses.backward()\n", + " grad -= x.grad\n", + " #print(grad.abs().mean(), x.grad.abs().mean(), end = \"\\r\")\n", + " return grad\n", + "\n", + "def null_fn(x_in):\n", + " return(torch.zeros_like(x_in))\n", + "\n", + "def display_handler(x,i,cadance = 5, decode = True):\n", + " global progress, image_grid, writer, img_tensor, im\n", + " img_tensor = x\n", + " if i%cadance==0:\n", + " if decode: \n", + " x = model.decode_first_stage(x)\n", + " grid = make_grid(torch.clamp((x+1.0)/2.0, min=0.0, max=1.0),round(x.shape[0]**0.5+0.2))\n", + " grid = 255. * rearrange(grid, 'c h w -> h w c').detach().cpu().numpy()\n", + " image_grid = grid.copy(order = \"C\") \n", + " with io.BytesIO() as output:\n", + " im = Image.fromarray(grid.astype(np.uint8))\n", + " im.save(output, format = \"PNG\")\n", + " progress.value = output.getvalue()\n", + " if generate_video:\n", + " im.save(p.stdin, 'PNG')\n", + "\n", + "def grad_fn(x,t):\n", + " if t <= 500 and grad_blur: x = TF.gaussian_blur(x, 2*round(int(max(grad_blur-t/150, 1)))-1, 1.5)\n", + " return x\n", + "\n", + "def cond_clamp(image,t): \n", + " t = 1000-t[0]\n", + " if t<= max(punish_steps, compress_steps):\n", + " s = torch.quantile(\n", + " rearrange(image, 'b ... -> b (...)').abs(),\n", + " threshold_percentile,\n", + " dim = -1\n", + " )\n", + " s = s.view(-1, *((1,) * (image.ndim - 1)))\n", + " ths = s.clamp(min = threshold)\n", + " im_max = image.clamp(min = ths) - image.clamp(min = ths, max = ths)\n", + " im_min = image.clamp(max = -ths, min = -ths) - image.clamp(max = -ths)\n", + " if t<=punish_steps:\n", + " image = image.clamp(min = -ths, max = ths)+(im_max-im_min) * punish_factor #((im_max-im_min)*punish_factor).tanh()/punish_factor \n", + " if t<= compress_steps:\n", + " image = image / (ths/threshold)**compress_factor\n", + " image += noise_like(image.shape,device,False) * ((ths/threshold)**compress_factor - 1)\n", + " return(image)\n", + " \n", + "def make_schedule(t_start, t_end, step_size=1):\n", + " schedule = []\n", + " par_schedule = []\n", + " t = t_start\n", + " while t > t_end:\n", + " schedule.append(t)\n", + " t -= step_size\n", + " schedule.append(t_end)\n", + " return np.array(schedule)\n", + "\n", + "lpips_model = lpips.LPIPS(net='vgg').to(device)\n", + "\n", + "def list_mul_to_array(list_mul):\n", + " i = 0\n", + " mul_count = 0\n", + " mul_string = ''\n", + " full_list = list_mul\n", + " full_list_len = len(full_list)\n", + " for item in full_list:\n", + " if(i == 0):\n", + " last_item = item\n", + " if(item == last_item):\n", + " mul_count+=1\n", + " if(item != last_item or full_list_len == i+1):\n", + " mul_string = mul_string + f' [{last_item}]*{mul_count} +'\n", + " mul_count=1\n", + " last_item = item\n", + " i+=1\n", + " return(mul_string[1:-2])\n", + "\n", + "def generate_settings_file(add_prompts=False, add_dimensions=False):\n", + " \n", + " if(add_prompts):\n", + " prompts = f'''\n", + " clip_prompts = {clip_prompts}\n", + " latent_prompts = {latent_prompts}\n", + " latent_negatives = {latent_negatives}\n", + " image_prompts = {image_prompts}\n", + " '''\n", + " else:\n", + " prompts = ''\n", + "\n", + " if(add_dimensions):\n", + " dimensions = f'''width = {width}\n", + " height = {height}\n", + " '''\n", + " else:\n", + " dimensions = ''\n", + " settings = f'''\n", + " #This settings file can be loaded back to Latent Majesty Diffusion. If you like your setting consider sharing it to the settings library at https://github.com/multimodalart/MajestyDiffusion\n", + " [model]\n", + " latent_diffusion_model = {latent_diffusion_model}\n", + " \n", + " [clip_list]\n", + " perceptors = {clip_load_list}\n", + " \n", + " [basic_settings]\n", + " #Perceptor things\n", + " {prompts}\n", + " {dimensions}\n", + " latent_diffusion_guidance_scale = {latent_diffusion_guidance_scale}\n", + " clip_guidance_scale = {clip_guidance_scale}\n", + " aesthetic_loss_scale = {aesthetic_loss_scale}\n", + " augment_cuts={augment_cuts}\n", + "\n", + " #Init image settings\n", + " starting_timestep = {starting_timestep}\n", + " init_scale = {init_scale} \n", + " init_brightness = {init_brightness}\n", + " \n", + " [advanced_settings]\n", + " #Add CLIP Guidance and all the flavors or just run normal Latent Diffusion\n", + " use_cond_fn = {use_cond_fn}\n", + "\n", + " #Custom schedules for cuts. Check out the schedules documentation here\n", + " custom_schedule_setting = {custom_schedule_setting}\n", + "\n", + " #Cut settings\n", + " clamp_index = {clamp_index}\n", + " cut_overview = {list_mul_to_array(cut_overview)}\n", + " cut_innercut = {list_mul_to_array(cut_innercut)}\n", + " cut_blur_n = {list_mul_to_array(cut_blur_n)}\n", + " cut_blur_kernel = {cut_blur_kernel}\n", + " cut_ic_pow = {cut_ic_pow}\n", + " cut_icgray_p = {list_mul_to_array(cut_icgray_p)}\n", + " cutn_batches = {cutn_batches}\n", + " range_index = {list_mul_to_array(range_index)}\n", + " active_function = \"{active_function}\"\n", + " ths_method= \"{ths_method}\"\n", + " tv_scales = {list_mul_to_array(tv_scales)}\n", + "\n", + " #If you uncomment this line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale will be used\n", + " clip_guidance_schedule = {list_mul_to_array(clip_guidance_index)}\n", + " \n", + " #Apply symmetric loss (force simmetry to your results)\n", + " symmetric_loss_scale = {symmetric_loss_scale} \n", + "\n", + " #Latent Diffusion Advanced Settings\n", + " #Use when latent upscale to correct satuation problem\n", + " scale_div = {scale_div}\n", + " #Magnify grad before clamping by how many times\n", + " opt_mag_mul = {opt_mag_mul}\n", + " opt_ddim_eta = {opt_ddim_eta}\n", + " opt_eta_end = {opt_eta_end}\n", + " opt_temperature = {opt_temperature}\n", + "\n", + " #Grad advanced settings\n", + " grad_center = {grad_center}\n", + " #Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", + " grad_scale={grad_scale} \n", + " score_modifier = {score_modifier}\n", + " threshold_percentile = {threshold_percentile}\n", + " threshold = {threshold}\n", + " var_index = {list_mul_to_array(var_index)}\n", + " var_range = {var_range}\n", + " mean_index = {list_mul_to_array(mean_index)}\n", + " mean_range = {mean_range}\n", + "\n", + " #Init image advanced settings\n", + " init_rotate={init_rotate}\n", + " mask_rotate={mask_rotate}\n", + " init_magnitude = {init_magnitude}\n", + "\n", + " #More settings\n", + " RGB_min = {RGB_min}\n", + " RGB_max = {RGB_max}\n", + " #How to pad the image with cut_overview\n", + " padargs = {padargs} \n", + " flip_aug={flip_aug}\n", + " \n", + " #Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", + " experimental_aesthetic_embeddings = {experimental_aesthetic_embeddings}\n", + " #How much you want this to influence your result\n", + " experimental_aesthetic_embeddings_weight = {experimental_aesthetic_embeddings_weight}\n", + " #9 are good aesthetic embeddings, 0 are bad ones\n", + " experimental_aesthetic_embeddings_score = {experimental_aesthetic_embeddings_score}\n", + "\n", + " # For fun dont change except if you really know what your are doing\n", + " grad_blur = {grad_blur}\n", + " compress_steps = {compress_steps}\n", + " compress_factor = {compress_factor}\n", + " punish_steps = {punish_steps}\n", + " punish_factor = {punish_factor}\n", + " '''\n", + " return(settings)\n", + "\n", + "#Alstro's aesthetic model\n", + "aesthetic_model_336 = torch.nn.Linear(768,1).cuda()\n", + "aesthetic_model_336.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_336_linear.pth\"))\n", + "\n", + "aesthetic_model_224 = torch.nn.Linear(768,1).cuda()\n", + "aesthetic_model_224.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_linear.pth\"))\n", + "\n", + "aesthetic_model_16 = torch.nn.Linear(512,1).cuda()\n", + "aesthetic_model_16.load_state_dict(torch.load(f\"{model_path}/ava_vit_b_16_linear.pth\"))\n", + "\n", + "aesthetic_model_32 = torch.nn.Linear(512,1).cuda()\n", + "aesthetic_model_32.load_state_dict(torch.load(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"))\n", + "\n", + "has_purged = False\n", + "def do_run():\n", + " global has_purged\n", + " if(has_purged):\n", + " global clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", + " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", + " has_purged = False\n", + " # with torch.cuda.amp.autocast():\n", + " global progress,target_embeds, weights, zero_embed, init, scale_factor, cur_step, uc, c\n", + " cur_step = 0\n", + " scale_factor = 1\n", + " make_cutouts = {}\n", + " for i in clip_list:\n", + " make_cutouts[i] = MakeCutouts(clip_size[i][0] if type(clip_size[i]) is tuple else clip_size[i],Overview=1)\n", + " target_embeds, weights ,zero_embed = {}, {}, {}\n", + " for i in clip_list:\n", + " target_embeds[i] = []\n", + " weights[i]=[]\n", + "\n", + " for prompt in prompts:\n", + " txt, weight = parse_prompt(prompt)\n", + " for i in clip_list:\n", + " if \"cloob\" not in i:\n", + " with torch.cuda.amp.autocast():\n", + " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", + " target_embeds[i].append(embeds)\n", + " weights[i].append(weight)\n", + " else:\n", + " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", + " target_embeds[i].append(embeds)\n", + " weights[i].append(weight)\n", + "\n", + " for prompt in image_prompts:\n", + " if prompt.startswith(\"data:\"):\n", + " img = base64_to_image(prompt).convert('RGB')\n", + " weight = 1\n", + " else:\n", + " print(f\"processing{prompt}\",end=\"\\r\")\n", + " path, weight = parse_prompt(prompt)\n", + " img = Image.open(fetch(path)).convert('RGB')\n", + " img = TF.resize(img, min(opt.W, opt.H, *img.size), transforms.InterpolationMode.LANCZOS)\n", + " for i in clip_list:\n", + " if \"cloob\" not in i:\n", + " with torch.cuda.amp.autocast():\n", + " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", + " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", + " target_embeds[i].append(embed)\n", + " weights[i].extend([weight])\n", + " else:\n", + " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", + " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", + " target_embeds[i].append(embed)\n", + " weights[i].extend([weight])\n", + " #if anti_jpg != 0:\n", + " # target_embeds[\"ViT-B-32--openai\"].append(torch.tensor([np.load(f\"{model_path}/openimages_512x_png_embed224.npz\")['arr_0']-np.load(f\"{model_path}/imagenet_512x_jpg_embed224.npz\")['arr_0']], device = device))\n", + " # weights[\"ViT-B-32--openai\"].append(anti_jpg)\n", + "\n", + " for i in clip_list:\n", + " target_embeds[i] = torch.cat(target_embeds[i])\n", + " weights[i] = torch.tensor([weights[i]], device=device)\n", + " shape = [4, opt.H//8, opt.W//8]\n", + " init = None\n", + " mask = None\n", + " transform = T.GaussianBlur(kernel_size=3, sigma=0.4)\n", + " if init_image is not None:\n", + " if init_image.startswith(\"data:\"):\n", + " img = base64_to_image(init_image).convert('RGB')\n", + " else:\n", + " img = Image.open(fetch(init_image)).convert('RGB')\n", + " init = TF.to_tensor(img).to(device).unsqueeze(0)\n", + " if init_rotate: init = torch.rot90(init, 1, [3,2]) \n", + " x0_original = torch.tensor(init)\n", + " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W])\n", + " init = init.mul(2).sub(1).half()\n", + " init_encoded = model.first_stage_model.encode(init).sample()* init_magnitude + init_brightness\n", + " #init_encoded = init_encoded + noise_like(init_encoded.shape,device,False).mul(init_noise)\n", + " upscaled_flag=True\n", + " else:\n", + " init = None\n", + " init_encoded = None\n", + " upscale_flag = False\n", + " if init_mask is not None:\n", + " mask = Image.open(fetch(init_mask)).convert('RGB')\n", + " mask = TF.to_tensor(mask).to(device).unsqueeze(0)\n", + " if mask_rotate: mask = torch.rot90(mask, 1, [3,2])\n", + " mask = F.interpolate(mask,[opt.H//8,opt.W//8]).mean(1)\n", + " mask = transform(mask)\n", + " print(mask)\n", + "\n", + "\n", + " #progress = widgets.Image(layout = widgets.Layout(max_width = \"400px\",max_height = \"512px\"))\n", + " #display.display(progress)\n", + "\n", + " if opt.plms:\n", + " sampler = PLMSSampler(model)\n", + " else:\n", + " sampler = DDIMSampler(model)\n", + "\n", + " os.makedirs(opt.outdir, exist_ok=True)\n", + " outpath = opt.outdir\n", + "\n", + " prompt = opt.prompt\n", + " sample_path = os.path.join(outpath, \"samples\")\n", + " os.makedirs(sample_path, exist_ok=True)\n", + " base_count = len(os.listdir(sample_path))\n", + "\n", + " all_samples=list()\n", + " last_step_upscale = False\n", + " eta1 = opt.ddim_eta\n", + " eta2 = opt.eta_end\n", + " with torch.enable_grad():\n", + " with torch.cuda.amp.autocast():\n", + " with model.ema_scope():\n", + " uc = None\n", + " if opt.scale != 1.0:\n", + " uc = model.get_learned_conditioning(opt.n_samples * opt.uc).cuda()\n", + " \n", + " for n in range(opt.n_iter):\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " c = model.get_learned_conditioning(opt.n_samples * prompt).cuda()\n", + " if init_encoded is None:\n", + " x_T = torch.randn([opt.n_samples,*shape], device=device)\n", + " upscaled_flag = False\n", + " x0 = None\n", + " else:\n", + " x_T = init_encoded\n", + " x0 = torch.tensor(x_T)\n", + " upscaled_flag = True\n", + " last_step_uspcale_list = []\n", + " diffusion_stages = 0\n", + " for custom_schedule in custom_schedules:\n", + " if type(custom_schedule) != type(\"\"):\n", + " diffusion_stages += 1\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " last_step_upscale = False\n", + " samples_ddim, _ = sampler.sample(S=opt.ddim_steps,\n", + " conditioning=c,\n", + " batch_size=opt.n_samples,\n", + " shape=shape,\n", + " custom_schedule = custom_schedule,\n", + " verbose=False,\n", + " unconditional_guidance_scale=opt.scale,\n", + " unconditional_conditioning=uc,\n", + " eta=eta1 if diffusion_stages == 1 or last_step_upscale else eta2,\n", + " eta_end=eta2,\n", + " img_callback=None if use_cond_fn else display_handler,\n", + " cond_fn=cond_fn if use_cond_fn else None,\n", + " temperature = opt.temperature,\n", + " x_adjust_fn=cond_clamp,\n", + " x_T = x_T,\n", + " x0=x0,\n", + " mask=mask,\n", + " score_corrector = score_corrector,\n", + " corrector_kwargs = score_corrector_setting,\n", + " x0_adjust_fn = dynamic_thresholding,\n", + " clip_embed = target_embeds[\"ViT-L-14--openai\"].mean(0, keepdim = True) if \"ViT-L-14--openai\" in clip_list else None\n", + " )\n", + " #x_T = samples_ddim.clamp(-6,6)\n", + " x_T = samples_ddim\n", + " last_step_upscale = False\n", + " else:\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " method, scale_factor = custom_schedule.split(\":\")\n", + " if method == \"RGB\":\n", + " scale_factor = float(scale_factor)\n", + " temp_file_name = \"temp_\"+f\"{str(round(time.time()))}.png\"\n", + " temp_file = os.path.join(sample_path, temp_file_name)\n", + " im.save(temp_file, format = \"PNG\")\n", + " init = Image.open(fetch(temp_file)).convert('RGB')\n", + " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", + " opt.H, opt.W = opt.H*scale_factor, opt.W*scale_factor\n", + " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W], antialiasing=True)\n", + " init = init.mul(2).sub(1).half()\n", + " x_T = (model.first_stage_model.encode(init).sample()*init_magnitude)\n", + " upscaled_flag = True\n", + " last_step_upscale = True\n", + " #x_T += noise_like(x_T.shape,device,False)*init_noise\n", + " #x_T = x_T.clamp(-6,6)\n", + " if method == \"gfpgan\":\n", + " scale_factor = float(scale_factor)\n", + " last_step_upscale = True\n", + " temp_file_name = \"temp_\"+f\"{str(round(time.time()))}.png\"\n", + " temp_file = os.path.join(sample_path, temp_file_name)\n", + " im.save(temp_file, format = \"PNG\")\n", + " GFP_factor = 2 if scale_factor > 1 else 1\n", + " GFP_ver = 1.3 #if GFP_factor == 1 else 1.2\n", + " %cd GFPGAN\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " !python inference_gfpgan.py -i $temp_file -o results -v $GFP_ver -s $GFP_factor\n", + " %cd ..\n", + " face_corrected = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\"))\n", + " with io.BytesIO() as output:\n", + " face_corrected.save(output,format=\"PNG\")\n", + " progress.value = output.getvalue()\n", + " init = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", + " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", + " opt.H, opt.W = opt.H*scale_factor, opt.W*scale_factor\n", + " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W], antialiasing=True)\n", + " init = init.mul(2).sub(1).half()\n", + " x_T = (model.first_stage_model.encode(init).sample()*init_magnitude)\n", + " upscaled_flag = True\n", + " #x_T += noise_like(x_T.shape,device,False)*init_noise\n", + " #x_T = x_T.clamp(-6,6)\n", + " if method ==\"scale\":\n", + " scale_factor = float(scale_factor)\n", + " x_T = x_T*scale_factor\n", + " if method ==\"noise\":\n", + " scale_factor = float(scale_factor)\n", + " x_T += noise_like(x_T.shape,device,False)*scale_factor\n", + " if method == \"purge\":\n", + " has_purged = True\n", + " for i in scale_factor.split(\",\"):\n", + " if i in clip_load_list:\n", + " arch, pub, m_id = i[1:-1].split(' - ')\n", + " print(\"Purge \",i)\n", + " del clip_list[clip_list.index(m_id)]\n", + " del clip_model[m_id]\n", + " del clip_size[m_id]\n", + " del clip_tokenize[m_id]\n", + " del clip_normalize[m_id]\n", + " #last_step_uspcale_list.append(last_step_upscale)\n", + " scale_factor = 1\n", + " current_time = str(round(time.time()))\n", + " if(last_step_upscale and method == 'gfpgan'):\n", + " latest_upscale = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", + " latest_upscale.save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", + " else:\n", + " Image.fromarray(image_grid.astype(np.uint8)).save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", + " settings = generate_settings_file(add_prompts=True, add_dimensions=False)\n", + " text_file = open(f\"{outpath}/{current_time}.cfg\", \"w\")\n", + " text_file.write(settings)\n", + " text_file.close()\n", + " x_samples_ddim = model.decode_first_stage(samples_ddim)\n", + " x_samples_ddim = torch.clamp((x_samples_ddim+1.0)/2.0, min=0.0, max=1.0)\n", + " all_samples.append(x_samples_ddim)\n", + "\n", + "\n", + " if(len(all_samples) > 1):\n", + " # additionally, save as grid\n", + " grid = torch.stack(all_samples, 0)\n", + " grid = rearrange(grid, 'n b c h w -> (n b) c h w')\n", + " grid = make_grid(grid, nrow=opt.n_samples)\n", + "\n", + " # to image\n", + " grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()\n", + " Image.fromarray(grid.astype(np.uint8)).save(os.path.join(outpath, f'grid_{str(round(time.time()))}.png'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ILHGCEla2Rrm" + }, + "source": [ + "# Run!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VpR9JhyCu5iq" + }, + "source": [ + "#### Perceptors (Choose your CLIP and CLIP-like models) \n", + "Be careful if you don't pay for Colab Pro selecting more CLIPs might make you go out of memory. If you do have Pro, try adding ViT-L14 to your mix" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "8K7l_E2JvLWC" + }, + "outputs": [], + "source": [ + "#@title Choose your perceptor models\n", + "\n", + "# suppress mmc warmup outputs\n", + "import mmc.loaders\n", + "clip_load_list = []\n", + "#@markdown #### Open AI CLIP models\n", + "ViT_B32 = False #@param {type:\"boolean\"}\n", + "ViT_B16 = True #@param {type:\"boolean\"}\n", + "ViT_L14 = True #@param {type:\"boolean\"}\n", + "ViT_L14_336px = False #@param {type:\"boolean\"}\n", + "#RN101 = False #@param {type:\"boolean\"}\n", + "#RN50 = False #@param {type:\"boolean\"}\n", + "RN50x4 = False #@param {type:\"boolean\"}\n", + "RN50x16 = False #@param {type:\"boolean\"}\n", + "RN50x64 = False #@param {type:\"boolean\"}\n", + "\n", + "#@markdown #### OpenCLIP models\n", + "ViT_B16_plus = False #@param {type: \"boolean\"}\n", + "ViT_B32_laion2b = True #@param {type: \"boolean\"}\n", + "ViT_L14_laion = False #@param {type:\"boolean\"}\n", + "\n", + "#@markdown #### Multilangual CLIP models \n", + "clip_farsi = False #@param {type: \"boolean\"}\n", + "clip_korean = False #@param {type: \"boolean\"}\n", + "\n", + "#@markdown #### CLOOB models\n", + "cloob_ViT_B16 = False #@param {type: \"boolean\"}\n", + "\n", + "# @markdown Load even more CLIP and CLIP-like models (from [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators))\n", + "model1 = \"\" # @param [\"[clip - mlfoundations - RN50--openai]\",\"[clip - mlfoundations - RN101--openai]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", + "model2 = \"\" # @param [\"[clip - mlfoundations - RN50--openai]\",\"[clip - mlfoundations - RN101--openai]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", + "model3 = \"\" # @param [\"[clip - openai - RN50]\",\"[clip - openai - RN101]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", + "\n", + "if ViT_B32: \n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--openai]\")\n", + "if ViT_B16: \n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16--openai]\")\n", + "if ViT_L14: \n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14--openai]\")\n", + "if RN50x4: \n", + " clip_load_list.append(\"[clip - mlfoundations - RN50x4--openai]\")\n", + "if RN50x64: \n", + " clip_load_list.append(\"[clip - mlfoundations - RN50x64--openai]\")\n", + "if RN50x16: \n", + " clip_load_list.append(\"[clip - mlfoundations - RN50x16--openai]\")\n", + "if ViT_L14_laion: \n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14--laion400m_e32]\")\n", + "if ViT_L14_336px:\n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14-336--openai]\")\n", + "if ViT_B16_plus:\n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16-plus-240--laion400m_e32]\")\n", + "if ViT_B32_laion2b:\n", + " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--laion2b_e16]\")\n", + "if clip_farsi:\n", + " clip_load_list.append(\"[clip - sajjjadayobi - clipfa]\")\n", + "if clip_korean:\n", + " clip_load_list.append(\"[clip - navervision - kelip_ViT-B/32]\")\n", + "if cloob_ViT_B16:\n", + " clip_load_list.append(\"[cloob - crowsonkb - cloob_laion_400m_vit_b_16_32_epochs]\")\n", + "\n", + "if model1:\n", + " clip_load_list.append(model1)\n", + "if model2:\n", + " clip_load_list.append(model2)\n", + "if model3:\n", + " clip_load_list.append(model3)\n", + "\n", + "\n", + "i = 0\n", + "from mmc.multimmc import MultiMMC\n", + "from mmc.modalities import TEXT, IMAGE\n", + "temp_perceptor = MultiMMC(TEXT, IMAGE)\n", + "\n", + "def get_mmc_models(clip_load_list):\n", + " mmc_models = []\n", + " for model_key in clip_load_list:\n", + " if not model_key:\n", + " continue\n", + " arch, pub, m_id = model_key[1:-1].split(' - ')\n", + " mmc_models.append({\n", + " 'architecture':arch,\n", + " 'publisher':pub,\n", + " 'id':m_id,\n", + " })\n", + " return mmc_models\n", + "mmc_models = get_mmc_models(clip_load_list)\n", + "\n", + "import mmc\n", + "from mmc.registry import REGISTRY\n", + "import mmc.loaders # force trigger model registrations\n", + "from mmc.mock.openai import MockOpenaiClip\n", + "\n", + "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", + " std=[0.26862954, 0.26130258, 0.27577711])\n", + "\n", + "\n", + "def load_clip_models(mmc_models):\n", + " clip_model, clip_size, clip_tokenize, clip_normalize= {},{},{},{}\n", + " clip_list = []\n", + " for item in mmc_models:\n", + " print(\"Loaded \", item[\"id\"])\n", + " clip_list.append(item[\"id\"])\n", + " model_loaders = REGISTRY.find(**item)\n", + " for model_loader in model_loaders:\n", + " clip_model_loaded = model_loader.load()\n", + " clip_model[item[\"id\"]] = MockOpenaiClip(clip_model_loaded)\n", + " clip_size[item[\"id\"]] = clip_model[item[\"id\"]].visual.input_resolution\n", + " clip_tokenize[item[\"id\"]] = clip_model[item[\"id\"]].preprocess_text()\n", + " clip_normalize[item[\"id\"]] = normalize\n", + " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", + "\n", + "\n", + "def full_clip_load(clip_load_list):\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " try:\n", + " del clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", + " except:\n", + " pass\n", + " mmc_models = get_mmc_models(clip_load_list)\n", + " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = load_clip_models(mmc_models)\n", + " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", + "\n", + "clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", + "clip_load_list_universal = clip_load_list\n", + "torch.cuda.empty_cache()\n", + "gc.collect()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N_Di3xFSXGWe" + }, + "source": [ + "#### Advanced settings for the generation\n", + "##### Access [our guide](https://multimodal.art/majesty-diffusion) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pAALegoCXEbm" + }, + "outputs": [], + "source": [ + "opt = DotMap()\n", + "\n", + "#Change it to false to not use CLIP Guidance at all \n", + "use_cond_fn = True\n", + "\n", + "#Custom cut schedules and super-resolution. Check out the guide on how to use it a https://multimodal.art/majestydiffusion\n", + "custom_schedule_setting = [\n", + " [50,1000,8],\n", + " \"gfpgan:1.5\",\"scale:.9\",\"noise:.55\",\n", + " [50,200,5],\n", + "]\n", + " \n", + "#Cut settings\n", + "#clamp_index = [2.1,1.6] #linear variation of the index for clamping the gradient \n", + "cut_overview = [8]*500 + [4]*500\n", + "cut_innercut = [0]*500 + [4]*500\n", + "cut_ic_pow = .2\n", + "cut_icgray_p = [.1]*300+[0]*1000\n", + "cutn_batches = 1\n", + "cut_blur_n = [0]*300 + [0]*1000\n", + "cut_blur_kernel = 3\n", + "range_index = [0]*200+ [5e4]*400 + [0]*1000\n", + "var_index = [2]*300+[0]*700\n", + "var_range = 0.5\n", + "mean_index = [0]*400+[0]*600\n", + "mean_range = 0.75\n", + "active_function = \"softsign\" # function to manipulate the gradient - help things to stablize\n", + "ths_method = \"clamp\" #clamp is another option\n", + "tv_scales = [150]*1+[0]*1 +[0]*2\n", + "\n", + "#If you uncomment next line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale basic setting will be used\n", + "#clip_guidance_schedule = [10000]*300 + [500]*700\n", + "\n", + "symmetric_loss_scale = 0 #Apply symmetric loss\n", + "\n", + "#Latent Diffusion Advanced Settings\n", + "scale_div = 1 # Use when latent upscale to correct satuation problem\n", + "opt_mag_mul = 20 #Magnify grad before clamping\n", + "#PLMS Currently not working, working on a fix\n", + "opt_plms = False #Experimental. It works but does not lookg good\n", + "opt_ddim_eta, opt_eta_end = [1.3,1.1] # linear variation of eta\n", + "opt_temperature = .98\n", + "\n", + "#Grad advanced settings\n", + "grad_center = False\n", + "grad_scale= 0.25 #Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", + "\n", + "#Restraints the model from explodign despite larger clamp\n", + "score_modifier = True\n", + "threshold_percentile = .85\n", + "threshold = 1\n", + "score_corrector_setting = [\"latent\",\"\"]\n", + "\n", + "#Init image advanced settings\n", + "init_rotate, mask_rotate=[False, False]\n", + "init_magnitude = 0.18215\n", + "\n", + "#Noise settings\n", + "upscale_noise_temperature = 1\n", + "upscale_xT_temperature = 1 \n", + "\n", + "#More settings\n", + "RGB_min, RGB_max = [-0.95,0.95]\n", + "padargs = {\"mode\":\"constant\", \"value\": -1} #How to pad the image with cut_overview\n", + "flip_aug=False\n", + "cutout_debug = False\n", + "opt.outdir = outputs_path\n", + "\n", + "#Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", + "experimental_aesthetic_embeddings = True\n", + "#How much you want this to influence your result\n", + "experimental_aesthetic_embeddings_weight = 0.3\n", + "#9 are good aesthetic embeddings, 0 are bad ones\n", + "experimental_aesthetic_embeddings_score = 8\n", + "\n", + "# For fun dont change except if you really know what your are doing\n", + "grad_blur = False\n", + "compress_steps = 200\n", + "compress_factor = 0.1\n", + "punish_steps = 200\n", + "punish_factor = 0.5" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZUu_pyTkuxiT" + }, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wo1tM270ryit" + }, + "source": [ + "### Prompts\n", + "The main prompt is the CLIP prompt. The Latent Prompts usually help with style and composition." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rRIC0eQervDN" + }, + "outputs": [], + "source": [ + "#Amp up your prompt game with prompt engineering, check out this guide: https://matthewmcateer.me/blog/clip-prompt-engineering/\n", + "#Prompt for CLIP Guidance\n", + "clip_prompts =[\"The portrait of a Majestic Princess, trending on artstation\"] \n", + "\n", + "#Prompt for Latent Diffusion\n", + "latent_prompts = [\"The portrait of a Majestic Princess, trending on artstation\"] \n", + "\n", + "#Negative prompts for Latent Diffusion\n", + "latent_negatives = [\"\"]\n", + "\n", + "image_prompts = []" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iv8-gEvUsADL" + }, + "source": [ + "### Diffuse!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "fmafGmcyT1mZ" + }, + "outputs": [], + "source": [ + "import warnings\n", + "warnings.filterwarnings('ignore')\n", + "#@markdown ### Basic settings \n", + "#@markdown We're still figuring out default settings. Experiment and share your settings with us\n", + "width = 256#@param{type: 'integer'}\n", + "height = 256#@param{type: 'integer'}\n", + "#@markdown The `latent_diffusion_guidance_scale` will determine how much the `latent_prompts` affect the image. Lower help with text interpretation, higher help with composition. Try values between 0-15. If you see too much text, lower it \n", + "latent_diffusion_guidance_scale = 12 #@param {type:\"number\"}\n", + "#@markdown The `clamp_index` will determine how much of the `clip_prompts` affect the image, it is a linear scale that will decrease from the first to the second value. Try values between 3-1\n", + "clamp_index = [2.4, 2.1] #@param{type: 'raw'}\n", + "clip_guidance_scale = 16000#@param{type: 'integer'}\n", + "how_many_batches = 1 #@param{type: 'integer'}\n", + "aesthetic_loss_scale = 400 #@param{type: 'integer'}\n", + "augment_cuts=True #@param{type:'boolean'}\n", + "\n", + "#@markdown\n", + "\n", + "#@markdown ### Init image settings\n", + "#@markdown `init_image` requires the path of an image to use as init to the model\n", + "init_image = None #@param{type: 'string'}\n", + "if(init_image == '' or init_image == 'None'):\n", + " init_image = None\n", + "#@markdown `starting_timestep`: How much noise do you want to add to your init image for it to then be difused by the model\n", + "starting_timestep = 0.9 #@param{type: 'number'}\n", + "#@markdown `init_mask` is a mask same width and height as the original image with the color black indicating where to inpaint\n", + "init_mask = None #@param{type: 'string'}\n", + "#@markdown `init_scale` controls how much the init image should influence the final result. Experiment with values around `1000`\n", + "init_scale = 1000 #@param{type: 'integer'}\n", + "init_brightness = 0.0 #@param{type: 'number'}\n", + "# @markdown How much extra noise to add to the init image, independently from skipping timesteps (use it also if you are upscaling)\n", + "#init_noise = 0.57 #@param{type: 'number'}\n", + "\n", + "#@markdown\n", + "\n", + "#@markdown ### Custom saved settings\n", + "#@markdown If you choose custom saved settings, the settings set by the preset overrule some of your choices. You can still modify the settings not in the preset. Check what each preset modifies here\n", + "custom_settings = 'path/to/settings.cfg' #@param{type:'string'}\n", + "settings_library = 'None (use settings defined above)' #@param [\"None (use settings defined above)\", \"default\", \"defaults_v1_3\", \"dango233_princesses\", \"the_other_zippy_defaults\", \"makeitrad_defaults\"]\n", + "if(settings_library != 'None (use settings defined above)'):\n", + " custom_settings = f'latent-majesty-diffusion-settings/{settings_library}.cfg'\n", + "\n", + "global_var_scope = globals()\n", + "if(custom_settings is not None and custom_settings != '' and custom_settings != 'path/to/settings.cfg'):\n", + " print('Loaded ', custom_settings)\n", + " try:\n", + " from configparser import ConfigParser\n", + " except ImportError:\n", + " from ConfigParser import ConfigParser\n", + " import configparser\n", + " \n", + " config = ConfigParser()\n", + " config.read(custom_settings)\n", + " #custom_settings_stream = fetch(custom_settings)\n", + " #Load CLIP models from config\n", + " if(config.has_section('clip_list')):\n", + " clip_incoming_list = config.items('clip_list')\n", + " clip_incoming_models = clip_incoming_list[0]\n", + " incoming_perceptors = eval(clip_incoming_models[1])\n", + " if((len(incoming_perceptors) != len(clip_load_list)) or not all(elem in incoming_perceptors for elem in clip_load_list)):\n", + " clip_load_list = incoming_perceptors\n", + " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", + "\n", + " #Load settings from config and replace variables\n", + " if(config.has_section('basic_settings')):\n", + " basic_settings = config.items('basic_settings')\n", + " for basic_setting in basic_settings:\n", + " global_var_scope[basic_setting[0]] = eval(basic_setting[1])\n", + " \n", + " if(config.has_section('advanced_settings')):\n", + " advanced_settings = config.items('advanced_settings')\n", + " for advanced_setting in advanced_settings:\n", + " global_var_scope[advanced_setting[0]] = eval(advanced_setting[1])\n", + "\n", + "if(((init_image is not None) and (init_image != 'None') and (init_image != '')) and starting_timestep != 1 and custom_schedule_setting[0][1] == 1000):\n", + " custom_schedule_setting[0] = [custom_schedule_setting[0][0], int(custom_schedule_setting[0][1]*starting_timestep), custom_schedule_setting[0][2]]\n", + "\n", + "prompts = clip_prompts\n", + "opt.prompt = latent_prompts\n", + "opt.uc = latent_negatives\n", + "custom_schedules = set_custom_schedules(custom_schedule_setting)\n", + "aes_scale = aesthetic_loss_scale\n", + "try: \n", + " clip_guidance_schedule\n", + " clip_guidance_index = clip_guidance_schedule\n", + "except:\n", + " clip_guidance_index = [clip_guidance_scale]*1000\n", + "\n", + "global progress\n", + "progress = widgets.Image(layout = widgets.Layout(max_width = \"400px\",max_height = \"512px\"))\n", + "display.display(progress)\n", + "for n in trange(how_many_batches, desc=\"Sampling\"):\n", + " print(f\"Sampling images {n+1}/{how_many_batches}\")\n", + " opt.W = (width//64)*64;\n", + " opt.H = (height//64)*64;\n", + " if opt.W != width or opt.H != height:\n", + " print(f'Changing output size to {opt.W}x{opt.H}. Dimensions must by multiples of 64.')\n", + "\n", + " opt.mag_mul = opt_mag_mul \n", + " opt.ddim_eta = opt_ddim_eta\n", + " opt.eta_end = opt_eta_end\n", + " opt.temperature = opt_temperature\n", + "\n", + " opt.scale = latent_diffusion_guidance_scale\n", + " opt.plms = opt_plms\n", + " aug = augment_cuts\n", + "\n", + " #Checks if it's not a normal schedule (legacy purposes to keep old configs compatible)\n", + " if(len(clamp_index) == 2): \n", + " clamp_index_variation = np.linspace(clamp_index[0],clamp_index[1],1000) \n", + "\n", + " else:\n", + " clamp_index_variation = clamp_index\n", + " score_corrector = DotMap()\n", + "\n", + "\n", + " def modify_score(e_t, e_t_uncond):\n", + " if(score_modifier is False):\n", + " return e_t\n", + " else:\n", + " e_t_d = (e_t - e_t_uncond)\n", + " s = torch.quantile(\n", + " rearrange(e_t_d, 'b ... -> b (...)').abs().float(),\n", + " threshold_percentile,\n", + " dim = -1\n", + " )\n", + "\n", + " s.clamp_(min = 1.)\n", + " s = s.view(-1, *((1,) * (e_t_d.ndim - 1)))\n", + " if ths_method == \"softsign\":\n", + " e_t_d = F.softsign(e_t_d) / s \n", + " elif ths_method == \"clamp\":\n", + " e_t_d = e_t_d.clamp(-s,s) / s * 1.3#1.2\n", + " e_t = e_t_uncond + e_t_d\n", + " return(e_t)\n", + " \n", + " score_corrector.modify_score = modify_score\n", + "\n", + " def dynamic_thresholding(pred_x0,t):\n", + " return(pred_x0)\n", + "\n", + " opt.n_iter = 1 #Old way for batching, avoid touching\n", + " opt.n_samples = 1 #How many implaes in parallel. Breaks upscaling\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + " generate_video = False\n", + " if generate_video: \n", + " fps = 24\n", + " p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE)\n", + " do_run()\n", + " if generate_video: \n", + " p.stdin.close()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4cvUzcO9FeMT" + }, + "source": [ + "### Save your own settings\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "LGLUCX_UGqka" + }, + "outputs": [], + "source": [ + "\n", + "#@markdown ### Save current settings\n", + "#@markdown If you would like to save your current settings, uncheck `skip_saving` and run this cell. You will get a `custom_settings.cfg` file you can reuse and share. If you like your results, share your settings with us on the settings library\n", + "skip_saving = True #@param{type:'boolean'}\n", + "if(not skip_saving):\n", + " data = generate_settings_file(add_prompts=False, add_dimensions=True)\n", + " text_file = open(\"custom_settings.cfg\", \"w\")\n", + " text_file.write(data)\n", + " text_file.close()\n", + " from google.colab import files\n", + " files.download('custom_settings.cfg')\n", + " print(\"Downloaded as custom_settings.cfg\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fzd-2mVMWHV0" + }, + "source": [ + "### Biases acknowledgment\n", + "Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the Latent Diffusion paper: \\\"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\\\". \n", + "\n", + "The model was trained on an unfiltered version the LAION-400M dataset, which scrapped non-curated image-text-pairs from the internet (the exception being the the removal of illegal content) and is meant to be used for research purposes, such as this one. You can read more on LAION's website" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [ + "xEVSOJ4f0B21", + "VpR9JhyCu5iq", + "N_Di3xFSXGWe", + "xEVSOJ4f0B21", + "WOAs3ZvLlktt" + ], + "machine_shape": "hm", + "name": "Latent Majesty Diffusion v1.6", + "private_outputs": true, + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}