Skip to content

Commit

Permalink
Added Model rescale and prepared a release upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
jaretburkett committed Aug 1, 2023
1 parent 63cacf4 commit 8b8d538
Show file tree
Hide file tree
Showing 15 changed files with 387 additions and 63 deletions.
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ pip3 install -r requirements.txt
I have so many hodge podge scripts I am going to be moving over to this that I use in my ML work. But this is what is
here so far.

---

### LoRA (lierla), LoCON (LyCORIS) extractor

It is based on the extractor in the [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) tool, but adding some QOL features
Expand All @@ -64,6 +66,31 @@ Most people used fixed, which is traditional fixed dimension extraction.

`process` is an array of different processes to run. You can add a few and mix and match. One LoRA, one LyCON, etc.

---

### LoRA Rescale

Change `<lora:my_lora:4.6>` to `<lora:my_lora:1.0>` or whatever you want with the same effect.
A tool for rescaling a LoRA's weights. Should would with LoCON as well, but I have not tested it.
It all runs off a config file, which you can find an example of in `config/examples/mod_lora_scale.yml`.
Just copy that file, into the `config` folder, and rename it to `whatever_you_want.yml`.
Then you can edit the file to your liking. and call it like so:

```bash
python3 run.py config/whatever_you_want.yml
```

You can also put a full path to a config file, if you want to keep it somewhere else.

```bash
python3 run.py "/home/user/whatever_you_want.yml"
```

More notes on how it works are available in the example config file itself. This is useful when making
all LoRAs, as the ideal weight is rarely 1.0, but now you can fix that. For sliders, they can have weird scales form -2 to 2
or even -15 to 15. This will allow you to dile it in so they all have your desired scale

---

### LoRA Slider Trainer

Expand Down Expand Up @@ -108,13 +135,32 @@ Just went in and out. It is much worse on smaller faces than shown here.

## TODO
- [X] Add proper regs on sliders
- [ ] Add SDXL support (base model only for now)
- [X] Add SDXL support (base model only for now)
- [ ] Add plain erasing
- [ ] Make Textual inversion network trainer (network that spits out TI embeddings)

---

## Change Log

#### 2021-08-01
Major changes and update. New LoRA rescale tool, look above for details. Added better metadata so
Automatic1111 knows what the base model is. Added some experiments and a ton of updates. This thing is still unstable
at the moment, so hopefully there are not breaking changes.

Unfortunately, I am too lazy to write a proper changelog with all the changes.

I added SDXL training to sliders... but.. it does not work properly.
The slider training relies on a model's ability to understand that an unconditional (negative prompt)
means you do not want that concept in the output. SDXL does not understand this for whatever reason,
which makes separating out
concepts within the model hard. I am sure the community will find a way to fix this
over time, but for now, it is not
going to work properly. And if any of you are thinking "Could we maybe fix it by adding 1 or 2 more text
encoders to the model as well as a few more entirely separate diffusion networks?" No. God no. It just needs a little
training without every experimental new paper added to it. The KISS principal.


#### 2021-07-30
Added "anchors" to the slider trainer. This allows you to set a prompt that will be used as a
regularizer. You can set the network multiplier to force spread consistency at high weights
Expand Down
48 changes: 48 additions & 0 deletions config/examples/mod_lora_scale.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
job: mod
config:
name: name_of_your_model_v1
process:
- type: rescale_lora
# path to your current lora model
input_path: "/path/to/lora/lora.safetensors"
# output path for your new lora model, can be the same as input_path to replace
output_path: "/path/to/lora/output_lora_v1.safetensors"
# replaces meta with the meta below (plus minimum meta fields)
# if false, we will leave the meta alone except for updating hashes (sd-script hashes)
replace_meta: true
# how to adjust, we can scale the up_down weights or the alpha
# up_down is the default and probably the best, they will both net the same outputs
# would only affect rare NaN cases and maybe merging with old merge tools
scale_target: 'up_down'
# precision to save, fp16 is the default and standard
save_dtype: fp16
# current_weight is the ideal weight you use as a multiplier when using the lora
# IE in automatic1111 <lora:my_lora:6.0> the 6.0 is the current_weight
# you can do negatives here too if you want to flip the lora
current_weight: 6.0
# target_weight is the ideal weight you use as a multiplier when using the lora
# instead of the one above. IE in automatic1111 instead of using <lora:my_lora:6.0>
# we want to use <lora:my_lora:1.0> so 1.0 is the target_weight
target_weight: 1.0

# base model for the lora
# this is just used to add meta so automatic111 knows which model it is for
# assume v1.5 if these are not set
is_xl: false
is_v2: false
meta:
# this is only used if you set replace_meta to true above
name: "[name]" # [name] gets replaced with the name above
description: A short description of your lora
trigger_words:
- put
- trigger
- words
- here
version: '0.1'
creator:
name: Your Name
email: [email protected]
website: https://yourwebsite.com
any: All meta data above is arbitrary, it can be whatever you want.
2 changes: 1 addition & 1 deletion info.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
v = OrderedDict()
v["name"] = "ai-toolkit"
v["repo"] = "https://github.com/ostris/ai-toolkit"
v["version"] = "0.0.1"
v["version"] = "0.0.2"

software_meta = v
28 changes: 28 additions & 0 deletions jobs/ModJob.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import os
from collections import OrderedDict
from jobs import BaseJob
from toolkit.metadata import get_meta_for_safetensors
from toolkit.train_tools import get_torch_dtype

process_dict = {
'rescale_lora': 'ModRescaleLoraProcess',
}


class ModJob(BaseJob):

def __init__(self, config: OrderedDict):
super().__init__(config)
self.device = self.get_conf('device', 'cpu')

# loads the processes from the config
self.load_processes(process_dict)

def run(self):
super().run()

print("")
print(f"Running {len(self.process)} process{'' if len(self.process) == 1 else 'es'}")

for process in self.process:
process.run()
1 change: 1 addition & 0 deletions jobs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
from .ExtractJob import ExtractJob
from .TrainJob import TrainJob
from .MergeJob import MergeJob
from .ModJob import ModJob
31 changes: 20 additions & 11 deletions jobs/process/BaseSDTrainProcess.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
DDIMScheduler, DDPMScheduler

from jobs.process import BaseTrainProcess
from toolkit.metadata import get_meta_for_safetensors, load_metadata_from_safetensors
from toolkit.metadata import get_meta_for_safetensors, load_metadata_from_safetensors, add_base_model_info_to_meta
from toolkit.train_tools import get_torch_dtype, apply_noise_offset
import gc

Expand Down Expand Up @@ -192,6 +192,7 @@ def sample(self, step=None, is_first=False):
num_inference_steps=sample_config.sample_steps,
guidance_scale=sample_config.guidance_scale,
negative_prompt=neg,
guidance_rescale=0.7,
).images[0]
else:
img = pipeline(
Expand Down Expand Up @@ -236,21 +237,26 @@ def sample(self, step=None, is_first=False):
# self.sd.tokenizer.to(original_device_dict['tokenizer'])

def update_training_metadata(self):
dict = OrderedDict({
o_dict = OrderedDict({
"training_info": self.get_training_info()
})
if self.model_config.is_v2:
dict['ss_v2'] = True
dict['ss_base_model_version'] = 'sd_2.1'
o_dict['ss_v2'] = True
o_dict['ss_base_model_version'] = 'sd_2.1'

elif self.model_config.is_xl:
dict['ss_base_model_version'] = 'sdxl_1.0'
o_dict['ss_base_model_version'] = 'sdxl_1.0'
else:
dict['ss_base_model_version'] = 'sd_1.5'
o_dict['ss_base_model_version'] = 'sd_1.5'

dict['ss_output_name'] = self.job.name
o_dict = add_base_model_info_to_meta(
o_dict,
is_v2=self.model_config.is_v2,
is_xl=self.model_config.is_xl,
)
o_dict['ss_output_name'] = self.job.name

self.add_meta(dict)
self.add_meta(o_dict)

def get_training_info(self):
info = OrderedDict({
Expand Down Expand Up @@ -381,15 +387,14 @@ def predict_noise(
text_embeddings: PromptEmbeds,
timestep: int,
guidance_scale=7.5,
guidance_rescale=0.7,
guidance_rescale=0, # 0.7
add_time_ids=None,
**kwargs,
):

if self.sd.is_xl:
if add_time_ids is None:
add_time_ids = self.get_time_ids_from_latents(latents)
# todo LECOs code looks like it is omitting noise_pred

latent_model_input = torch.cat([latents] * 2)

Expand Down Expand Up @@ -500,13 +505,17 @@ def run(self):
dtype = get_torch_dtype(self.train_config.dtype)

# TODO handle other schedulers
sch = KDPM2DiscreteScheduler
# sch = KDPM2DiscreteScheduler
sch = DDPMScheduler
# do our own scheduler
prediction_type = "v_prediction" if self.model_config.is_v_pred else "epsilon"
scheduler = sch(
num_train_timesteps=1000,
beta_start=0.00085,
beta_end=0.0120,
beta_schedule="scaled_linear",
clip_sample=False,
prediction_type=prediction_type,
)
if self.model_config.is_xl:
if self.custom_pipeline is not None:
Expand Down
100 changes: 100 additions & 0 deletions jobs/process/ModRescaleLoraProcess.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import gc
import os
from collections import OrderedDict
from typing import ForwardRef

import torch
from safetensors.torch import save_file, load_file

from jobs.process.BaseProcess import BaseProcess
from toolkit.metadata import get_meta_for_safetensors, load_metadata_from_safetensors, add_model_hash_to_meta, \
add_base_model_info_to_meta
from toolkit.train_tools import get_torch_dtype


class ModRescaleLoraProcess(BaseProcess):
process_id: int
config: OrderedDict
progress_bar: ForwardRef('tqdm') = None

def __init__(
self,
process_id: int,
job,
config: OrderedDict
):
super().__init__(process_id, job, config)
self.input_path = self.get_conf('input_path', required=True)
self.output_path = self.get_conf('output_path', required=True)
self.replace_meta = self.get_conf('replace_meta', default=False)
self.save_dtype = self.get_conf('save_dtype', default='fp16', as_type=get_torch_dtype)
self.current_weight = self.get_conf('current_weight', required=True, as_type=float)
self.target_weight = self.get_conf('target_weight', required=True, as_type=float)
self.scale_target = self.get_conf('scale_target', default='up_down') # alpha or up_down
self.is_xl = self.get_conf('is_xl', default=False, as_type=bool)
self.is_v2 = self.get_conf('is_v2', default=False, as_type=bool)

self.progress_bar = None

def run(self):
super().run()
source_state_dict = load_file(self.input_path)
source_meta = load_metadata_from_safetensors(self.input_path)

if self.replace_meta:
self.meta.update(
add_base_model_info_to_meta(
self.meta,
is_xl=self.is_xl,
is_v2=self.is_v2,
)
)
save_meta = get_meta_for_safetensors(self.meta, self.job.name)
else:
save_meta = get_meta_for_safetensors(source_meta, self.job.name, add_software_info=False)

# save
os.makedirs(os.path.dirname(self.output_path), exist_ok=True)

new_state_dict = OrderedDict()

for key in list(source_state_dict.keys()):
v = source_state_dict[key]
v = v.detach().clone().to("cpu").to(get_torch_dtype('fp32'))

# all loras have an alpha, up weight and down weight
# - "lora_te_text_model_encoder_layers_0_mlp_fc1.alpha",
# - "lora_te_text_model_encoder_layers_0_mlp_fc1.lora_down.weight",
# - "lora_te_text_model_encoder_layers_0_mlp_fc1.lora_up.weight",
# we can rescale by adjusting the alpha or the up weights, or the up and down weights
# I assume doing both up and down would be best all around, but I'm not sure
# some locons also have mid weights, we will leave those alone for now, will work without them

# when adjusting alpha, it is used to calculate the multiplier in a lora module
# - scale = alpha / lora_dim
# - output = layer_out + lora_up_out * multiplier * scale
total_module_scale = torch.tensor(self.current_weight / self.target_weight) \
.to("cpu", dtype=get_torch_dtype('fp32'))
num_modules_layers = 2 # up and down
up_down_scale = torch.pow(total_module_scale, 1.0 / num_modules_layers) \
.to("cpu", dtype=get_torch_dtype('fp32'))
# only update alpha
if self.scale_target == 'alpha' and key.endswith('.alpha'):
v = v * total_module_scale
if self.scale_target == 'up_down' and key.endswith('.lora_up.weight') or key.endswith('.lora_down.weight'):
# would it be better to adjust the up weights for fp16 precision? Doing both should reduce chance of NaN
v = v * up_down_scale
new_state_dict[key] = v.to(get_torch_dtype(self.save_dtype))

save_meta = add_model_hash_to_meta(new_state_dict, save_meta)
save_file(new_state_dict, self.output_path, save_meta)

# cleanup incase there are other jobs
del new_state_dict
del source_state_dict
del source_meta

torch.cuda.empty_cache()
gc.collect()

print(f"Saved to {self.output_path}")
Loading

0 comments on commit 8b8d538

Please sign in to comment.