Todos for this project #138
Replies: 4 comments 17 replies
-
Thanks for the list. @cloneofsimo you are not lacking in creativity or ambition! Some quick thoughts.
Interesting idea. I wonder if it's somewhat related to idea from this paper (https://arxiv.org/pdf/2206.06122.pdf), which trains only the singular values of from SVD of the linear weights. My intuition is that at full rank, this is the same as LoRA (although they don't cite it), but at reduced rank, maybe it's faster and smaller (assuming you recompute U and V on loading).
I'll finish my PR soon, there seem to be some benefits of this combined with an extra nonlinearity.
This would be cool. Seems like we could do this right away by just allowing the LoRALayers to checking if the down/up tensors have a third dimension and summing over them in forward?
IMO, some of these items should go into another repo (or several given the length of the list!). I think the augementation direction is potentially huge (combined with maybe some semantic segmentation and/or CLIPSeg). Importance sampling, etc...
Agreed, I think any of the continual learning methods could be applied here. I for one would like to keep adding things to a model, rather than retraining the same base model each time. Time to replace that sledgehammer in Dreambooth (prior class regularization).
This seems like such a priority area. Without good metrics, everyone is training in the dark. Even non-ideal metrics seem better than asking someone on Reddit how many steps per image to train for. I find it wild that there isn't more work on this. Tangentially related is the issue of loss functions. I think everyone uses L2 loss for simplicity, but I'm finding that L1 loss is sometimes better for capturing face details. Seems like understanding ideal loss functions (in latent space) would be interesting. This intersects with continual learning in the case of some regularizations (e.g., Elastic Weight Consolidation).
I found that there is a specific problem with enabling gradient accumulation, but only for the text_encoder. Enabling it only for the unet seems to work fine. I'll leave a comment on the relevant issue when I can find my notes. Two other thoughts for speed:
I ran a first pass on training only the up_blocks of the Unet, and it does extremely well, even at the same rank (compared to down+mid+up blocks). I'll make some image grids this week for discussion, just need to try it with the conv2d layers as well.
I'm working on a repo for exactly this (hope to have alpha pre-release out this week, but any feedback on the WIP branch is welcome!). Let's you add LoRA as you want (or not) throughout the model (you can make any combination of the models here and much more). I'm building a community pipeline to submit to diffusers so we can get more people onboard with LoRA experimentation (and of course I'll make it compatible with how the weights are saved in @cloneofsimo 's repo). |
Beta Was this translation helpful? Give feedback.
-
Also have a look at this recent pull request stable-diffusion, uses a depth or clipseg mask for weighting the loss for faster training. "This will add the ability to train Embeddings and Hypernetworks using a weighted loss." |
Beta Was this translation helpful? Give feedback.
-
^ It's a +1 for me
|
Beta Was this translation helpful? Give feedback.
-
One confusing thing for people is that there a multiple LoRA file format. The one from kohya is the easiest to use via the extension to load many LoRA models without having to merge to a model before using it. If this project could adopt the same LoRA file format it would make it much easier to use them... where now it is cumbersome to have to merge a LoRA into a model to be able to make use of it... unless I missed something... |
Beta Was this translation helpful? Give feedback.
-
To be honest, I am fascinated by how much people enjoyed this project and are using them to have fun. I've got some contacts from many startups and companies, and surprisingly many of them have deployed this project for their pipeline.
Unfortunately, I wouldn't be able to work on this project forever, because I am going to do a master degrees from March, Until then, I have some plans for the future... Some of these could potentially be paper-worthy, and I would definitely be interested in collaboration if someone reading this have a plan.
Research side of things
Modelling
Merging operation would make
Where$\Sigma = I_{2r}$ . This is now rank $2 r$ , and has a capability to become either one. Notice that, with
We have:
Dataset
(independent with LoRA. Should I make another repo for this?)
Basic stuff : Preprocessor that handles SR, BLIP, CLIPSeg for autocaption.Basic dataset pipelines #139(In general, lots of continual learning based methods seems to be unexplored for fine-tuning with high editability.)
Distillation
SVD distillation that distills resnets + other bias terms as well. Bias terms should have small weights.SVD update with conv support + LoRA add, weight update following recent updates #140Metrics
(independent with LoRA.)
Image alignment (CLIP image score) is extremely bad proxy for what we actually want. There should be a better metric, but no research on this topic.
Other tasks
Engineering side of things
Memory optimization
Speed optimization
Approachability (Make it simpler for non-developers)
Up-to-date Merging operationsSVD update with conv support + LoRA add, weight update following recent updates #140Inference optimization
(This looks like a huge work, so def not a priority)
Beta Was this translation helpful? Give feedback.
All reactions