diff --git a/README.md b/README.md index 868dd58b..4925f238 100644 --- a/README.md +++ b/README.md @@ -1,60 +1,106 @@ # AI Toolkit by Ostris -WIP for now, but will be a collection of tools for AI tools as I need them. +## IMPORTANT NOTE - READ THIS +This is an active WIP repo that is not ready for others to use. And definitely not ready for non developers to use. +I am making major breaking changes and pushing straight to master until I have it in a planned state. I have big changes +planned for config files and the general structure. I may change how training works entirely. You are welcome to use +but keep that in mind. If more people start to use it, I will follow better branch checkout standards, but for now +this is my personal active experiment. + +Report bugs as you find them, but not knowing how to train ML models, setup an environment, or use python is not a bug. +I will make all of this more user-friendly eventually + +I will make a better readme later. ## Installation -I will try to update this to be more beginner-friendly, but for now I am assuming -a general understanding of python, pip, pytorch, and using virtual environments: +Requirements: +- python >3.10 +- Nvidia GPU with enough ram to do what you need +- python venv +- git + -Linux: +Linux: ```bash +git clone https://github.com/ostris/ai-toolkit.git +cd ai-toolkit git submodule update --init --recursive -pythion3 -m venv venv +python3 -m venv venv source venv/bin/activate -pip install -r requirements.txt -cd requirements/sd-scripts -pip install --no-deps -e . -cd ../.. +# or source venv/Scripts/activate on windows +pip3 install -r requirements.txt ``` -Windows: - -```bash -git submodule update --init --recursive -pythion3 -m venv venv -venv\Scripts\activate -pip install -r requirements.txt -cd requirements/sd-scripts -pip install --no-deps -e . -cd ../.. -``` +--- ## Current Tools -### LyCORIS extractor +I have so many hodge podge scripts I am going to be moving over to this that I use in my ML work. But this is what is +here so far. + +### LoRA (lierla), LoCON (LyCORIS) extractor -It is similar to the [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) tool, but adding some QOL features. -It all runs off a config file, which you can find an example of in `config/examples/locon_config.example.json`. -Just copy that file, into the `config` folder, and rename it to `whatever_you_want.json`. +It is based on the extractor in the [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) tool, but adding some QOL features +and LoRA (lierla) support. It can do multiple types of extractions in one run. +It all runs off a config file, which you can find an example of in `config/examples/extract.example.yml`. +Just copy that file, into the `config` folder, and rename it to `whatever_you_want.yml`. Then you can edit the file to your liking. and call it like so: ```bash -python3 run.py "whatever_you_want" +python3 run.py config/whatever_you_want.yml ``` You can also put a full path to a config file, if you want to keep it somewhere else. ```bash -python3 run.py "/home/user/whatever_you_want.json" +python3 run.py "/home/user/whatever_you_want.yml" +``` + +More notes on how it works are available in the example config file itself. LoRA and LoCON both support +extractions of 'fixed', 'threshold', 'ratio', 'quantile'. I'll update what these do and mean later. +Most people used fixed, which is traditional fixed dimension extraction. + +`process` is an array of different processes to run. You can add a few and mix and match. One LoRA, one LyCON, etc. + + +### LoRA Slider Trainer + +This is how I train most of the recent sliders I have on Civitai, you can check them out in my [Civitai profile](https://civitai.com/user/Ostris/models). +It is based off the work by [p1atdev/LECO](https://github.com/p1atdev/LECO) and [rohitgandikota/erasing](https://github.com/rohitgandikota/erasing) +But has been heavily modified to create sliders rather than erasing concepts. I have a lot more plans on this, but it is +very functional as is. It is also very easy to use. Just copy the example config file in `config/examples/train_slider.example.yml` +to the `config` folder and rename it to `whatever_you_want.yml`. Then you can edit the file to your liking. and call it like so: + +```bash +python3 run.py config/whatever_you_want.yml ``` -File name is auto generated and dumped into the `output` folder. You can put whatever meta you want in the -`meta` section of the config file, and it will be added to the metadata of the output file. I just have -some recommended fields in the example file. The script will add some other useful metadata as well. +There is a lot more information in that example file. You can even run the example as is without any modifications to see +how it works. It will create a slider that turns all animals into dogs(neg) or cats(pos). Just run it like so: + +```bash +python3 run.py config/examples/train_slider.example.yml +``` + +And you will be able to see how it works without configuring anything. No datasets are required for this method. +I will post an better tutorial soon. + +--- + +## WIP Tools + + +### VAE (Variational Auto Encoder) Trainer -process is an array or different processes to run on the conversion to test. You will normally just need one though. +This works, but is not ready for others to use and therefore does not have an example config. +I am still working on it. I will update this when it is ready. +I am adding a lot of features for criteria that I have used in my image enlargement work. A Critic (discriminator), +content loss, style loss, and a few more. If you don't know, the VAE +for stable diffusion (yes even the MSE one, and SDXL), are horrible at smaller faces and it holds SD back. I will fix this. +I'll post more about this later with better examples later, but here is a quick test of a run through with various VAEs. +Just went in and out. It is much worse on smaller faces than shown here. -Will update this later. + diff --git a/assets/VAE_test1.jpg b/assets/VAE_test1.jpg new file mode 100644 index 00000000..bdd489b8 Binary files /dev/null and b/assets/VAE_test1.jpg differ diff --git a/config/examples/extract.example.yml b/config/examples/extract.example.yml index 5eee869e..52505bb9 100644 --- a/config/examples/extract.example.yml +++ b/config/examples/extract.example.yml @@ -47,7 +47,8 @@ config: - type: lora # traditional lora extraction (lierla) with linear layers only filename: "[name]_4.safetensors" mode: fixed # fixed, ratio, quantile supported for lora as well - linear: 4 + linear: 4 # lora dim or rank + # no conv for lora # process 5 - type: lora diff --git a/config/examples/train_slider.example.yml b/config/examples/train_slider.example.yml index 8956eff1..3d37eefc 100644 --- a/config/examples/train_slider.example.yml +++ b/config/examples/train_slider.example.yml @@ -33,7 +33,7 @@ config: # how many steps to train. More is not always better. I rarely go over 1000 steps: 500 # I have had good results with 4e-4 to 1e-4 at 500 steps - lr: 2e-4 + lr: 1e-4 # train the unet. I recommend leaving this true train_unet: true # train the text encoder. I don't recommend this unless you have a special use case @@ -70,7 +70,7 @@ config: # saving config save: dtype: float16 # precision to save. I recommend float16 - save_every: 100 # save every this many steps + save_every: 50 # save every this many steps # sampling config sample: @@ -90,7 +90,7 @@ config: # --n [string] # negative prompt, will inherit sample.neg if not set # Only 75 tokens allowed currently - prompts: + prompts: # our example is an animal slider, neg: dog, pos: cat - "a golden retriever --m -5" - "a golden retriever --m -3" - "a golden retriever --m 3" @@ -99,6 +99,10 @@ config: - "calico cat --m -3" - "calico cat --m 3" - "calico cat --m 5" + - "an elephant --m -5" + - "an elephant --m -3" + - "an elephant --m 3" + - "an elephant --m 5" # negative prompt used on all prompts above as default if they don't have one neg: "cartoon, fake, drawing, illustration, cgi, animated, anime, monochrome" # seed for sampling. 42 is the answer for everything