Pivotal Tuning CLI and how to use them. #121

cloneofsimo · 2023-01-08T17:17:18Z

cloneofsimo
Jan 8, 2023
Maintainer

Most of the recent updates are about lora_pti : CLI for Pivotal Tuning Inversion, with various tricks and techs to get extreme performance LoRA training output.

All of the README examples, including example above, were built with lora_pti CLI with DEFAULT PARAMETERS Here are new parameters and what they mean:

Extended Latent + Dataset

  --placeholder_tokens="<s1>|<s2>" \ 
  --placeholder_token_at_data="<tok>|<s1><s2>"\

So, with recent extended latent, you can train multiple tokens for textual inversion, and they are declared with placeholder_tokens argument "<TOK1>|<TOK2>|...", each of them seperated with | character.
If you are using caption dataset (That is, file name of the dataset is caption), you might want to map certain token in caption to the token of your need.

For example, let's say you initialized two tokens <s1>|<s2>, and you have an image file : a photo of <tok> holding flowers .jpg , you want to substitute <tok> with <s1> wearing <s2>. Then, you use argument :

  --placeholder_token_at_data="<tok>|<s1> wearing <s2>"\

The argument will transform the caption to : a photo of <s1> wearing <s2> holding flowers for the image.

Also, if you want to use template caption, use the argument :

 --use_template="style" \ Use this if you have no caption as the filename

There is type style and object available.

Mask Conditioned Training

 --use_face_segmentation_condition

Use this if you want to focus on faces. Else, remove this line. Check this if you are interested what this does.

Training steps for Two stages

So there is two stage in PTI. One is Bayesian training textual inversion with high learning rate, and one is training LoRA.

  --max_train_steps_ti=1000 \ Number of steps for stage 1.
  --max_train_steps_tuning=1000 \ Number of steps for stage 2. 1000 is more than enough.

If the concept is difficult, you want stage 1 steps to be higher. But 1000 is very likely to be ok. In many cases, having 500 ti, 500 tuning works just as ok.
If you are going to have smaller learning rate, you would definitely want to bump up these values, as it would need longer training time.

Other arguments (default is probably OK)

Here are kind of unimportant ones / or ones you might be familiar with. You can use the default values, but if you want to know :

  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  \  Base model to fine tune.
  --instance_data_dir="./data/some_data" \ Directory where the dataset exists.
  --output_dir="./model/lora1" \ Directory where output loras will be saved 
  --train_text_encoder \ This will train the text encoder as well.
  --resolution=512 \ 
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --scale_lr \ This will re-scale learning rate. You probably want this on.
  --initializer_tokens="st|at" \ If you are using n tokens, you want to have n initializer tokens, seperated with | . 
  --learning_rate_unet=1e-4 \ Unet learning rate. this works ok.
  --learning_rate_text=1e-5 \ Text encoder learning rate. this works ok.
  --learning_rate_ti=5e-4 \ Textual inversion learning rate. This works ok.
  --color_jitter \ This will augment dataset's color slightly. Usually ok to turn on unless your color matters like insane.
  --lr_scheduler="linear" \ This will be a scheduler for textual inversion only. LoRA will be added in the future.
  --lr_warmup_steps=0 \ Warmup steps for textual inversion learning rate
  --save_steps=100 \ Saving frequency. Keep this small. LoRA is tiny anyways
  --perform_inversion=True \ You want this unless you just want to train LoRA.
  --clip_ti_decay \ This will keep the norm of the textual inversion intact. You definitely want this on.
  --weight_decay_ti=0.000 \ This will even further make textual inversion smaller. You don't need this positive if you have clip_ti_decay option on.
  --weight_decay_lora=0.001\ 
  --continue_inversion \ Whether to continue training embedding after stage 1.  
  --continue_inversion_lr=1e-4 \ If you do, then with what learning rate? You probably want this small.
  --device="cuda:0" \ Device to train on
    --lora_rank=4 \ Rank of the LoRA

If you have any questions, comment below, I will update this discussion whenever there is an update on the CLI part.

abi · 2023-01-08T21:45:33Z

abi
Jan 8, 2023

Thanks for your amazing work here! Excited to follow along the progress.

Can you explain what initializer_tokens and placeholder_tokens mean? I'm looking to train a particular person's face, where should I put a name/unique token associated with them and where the class token ("man", "girl") go?

10 replies

cloneofsimo Jan 9, 2023
Maintainer Author

It is just how it will be initialized, and will become a meaningful token after training. You will have to do
photo of a <s1><s2><s3> to get a result.

FrancescoSaverioZuppichini Feb 24, 2023

I don't understand, I n picture of a person's face, how should I call the images? What should I use in the CI? Can somebody give a real example and explain why?

harrywang Mar 4, 2023

@cloneofsimo thanks for the great work. After reading the comments here, I still don't understand the meaning of <s1> and <s2> - should there just be one special token if I train for one style using text inversion? why s1 and s2? I am very confused. It would be great if you can give any pointers. Thanks a lot!

guangyangsjc18 Mar 7, 2023

I also do not understand. If you have multiple tokens, shouldn't you have multiple instance_data_dir as well? Or these are inferred by the subfolder names ?

strangehelix May 5, 2023

Reading about textual inversions may help https://huggingface.co/docs/diffusers/training/text_inversion

FunWithFaces · 2023-01-09T15:14:14Z

FunWithFaces
Jan 9, 2023

Thanks for this!

Can you let us know the minimum VRAM that is required to run this?

Also -- I'm wondering if it might be possible to provide an example with public domain images that we could use to try to reproduce a test result and make sure everything is set up properly. I appreciate that this probably is not a priority for you at the moment!

9 replies

2kpr Jan 12, 2023

Working on my 3080 Ti at about 11.9/12.2GB

FunWithFaces Jan 27, 2023

Does --use_xformers work with the lora_pti CLI, or is there something more I need to do to get this going? I'm still getting OOM with 12GB, seems like I must be close!

FunWithFaces Jan 29, 2023

Edit -- don't do this, see cloneofsimo comment below

In case anyone else is wondering about this, I added xformers support by modifying cli_lora_pti.py by adding the line

from lora_diffusion.xformers_utils import set_use_memory_efficient_attention_xformers

after the "from lora_diffusion import" block (line 35)

and then added:
set_use_memory_efficient_attention_xformers(unet, True)
set_use_memory_efficient_attention_xformers(vae, True)
after the gradient checkpointing command on line ~600

worked a charm. This was just based on what i saw in the older training scripts. Probably a better way to go about it I have no idea what I'm really doing

cloneofsimo Jan 29, 2023
Maintainer Author

I added it in the lora_pti,I think it should work by now with --enable_xformers_memory_efficient_attention argument

FunWithFaces Jan 31, 2023

OK so, it looks like there is exactly one xformers version that functions with this right now, I'm just conveying this info from the helpful folks in the auto1111 dreambooth extension discord.

if on linux, there is a wheel available via
https://github.com/Zuxier/xformers/releases/download/torch1.13-0.14dev0-python3.10/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl

if you want to attempt to build yourself it is
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git checkout e23b369c09468

also note that another (not sure github handle, insignia | Real Deal on discord) pointed out that you may have issues with the face segmentation stuff without

changing:
[Image.open(f) for f in self.instance_images_path]

to :
[Image.open(f).convert('RGB') for f in self.instance_images_path]
(in around line 188 i think, dataset.py)

I am still new to github and havent figured out how to submit PR yet, I'll get there

geocine · 2023-01-09T16:34:09Z

geocine
Jan 9, 2023

@cloneofsimo . This is my output running your example at https://github.com/cloneofsimo/lora/blob/master/training_scripts/use_face_conditioning_example.sh

What is the purpose of the _inv_ files?
I want to merge these to an existing ckpt so I need to convert these safetensors to pt then use lora_add? If so, how do I convert these safetensors to pt. I only see pt_to_safetensors but not the other way around. Also I understand lora_add only works with pt . Need your guide how to proceed.

11 replies

SU1199 Jan 10, 2023

@geocine
Can you share your pip freeze.. I've been trying since 7 hours to replicate this to no avail :(

geocine Jan 10, 2023

Sure I'll get back to you in 2 hours

Elevory Jan 10, 2023

@Elevory what did you use to convert the safetensors to pt?

I used this: https://github.com/diStyApps/Safe-and-Stable-Ckpt2Safetensors-Conversion-Tool-GUI

geocine Jan 10, 2023

Thanks @Elevory , @SU1199 , I'm using a conda environment here you go https://gist.github.com/geocine/5f4523725dbd8b7343de28f37fc45720

DTG777 Apr 23, 2023

I'm having some trouble running this file. Could you please tell me how you run this file to train the lora model？

GabrielTomasel · 2023-04-19T22:34:49Z

GabrielTomasel
Apr 19, 2023

using civitai various models and others that i trained myself i get "Rank should be the same per model" error using the cloneofsimo/lora from replicate. What can i do to make the LoRAs compatible with that repo? Using Automatic1111 those loras work ok.
i see that --lora_rank=4 \ Rank of the LoRA is an option but dont know what that means of how can i set that in Kohya colab notebook to train my models.

0 replies

weidaru · 2023-04-30T04:31:23Z

weidaru
Apr 30, 2023

All of the examples use <s1>|<s2> as placeholder tokens. Any reason behind that? If I train a style, is a single token <s1> good enough?

0 replies

Lucky-Light-Sun · 2023-07-12T08:22:41Z

Lucky-Light-Sun
Jul 12, 2023

hello, I have a question about the word "CLI". I know LoRA, and I know a little about Pivotal Tuning. But what does the word "CLI" mean? Is this short for CLIP? Or it refers to a new paper?

2 replies

shaokun Jul 21, 2023

CLI is command line interface...

Lucky-Light-Sun Jul 24, 2023

Got it, thank you

TeaCult · 2023-08-06T10:26:31Z

TeaCult
Aug 6, 2023

I really dont understand how to prepare data. There are some xxx.json some xxx.yaml other datasets have imagename.png and imagename.txt, I cannot relate etc to these. I could not find a clear guide to explain this. Peft is different , your lora and automaitic 1111 lora dataset seems different. Civit AI has an article to scape and train lora it is different. Hugging face diffusers lora is also different. Can you please explain this for a confused person :).

I am ready to start from textual inversion and most basic. But since guides and codes have api like middlewares I cannot understand what lies behind and its logic.

Thank you.

1 reply

atacanpolat May 27, 2024

I'd like to know this as well :)

TeaCult · 2023-08-06T10:28:27Z

TeaCult
Aug 6, 2023

Also I put safetensors to automatic 1111 seems that it does not work. Either I could not get dataset tags right or safetensors does not work with it. In that case, How can I convert safetensors to pt ? btw believe me "I have googled a lot"

0 replies

Pivotal Tuning CLI and how to use them. #121

cloneofsimo Jan 8, 2023 Maintainer

Extended Latent + Dataset

Mask Conditioned Training

Training steps for Two stages

Other arguments (default is probably OK)

Replies: 8 comments · 33 replies

cloneofsimo Jan 9, 2023 Maintainer Author

cloneofsimo Jan 29, 2023 Maintainer Author

cloneofsimo
Jan 8, 2023
Maintainer

Replies: 8 comments 33 replies

cloneofsimo Jan 9, 2023
Maintainer Author

cloneofsimo Jan 29, 2023
Maintainer Author