ENH Argument to enable bias for LoRA B #2237

BenjaminBossan · 2024-11-26T13:50:33Z

This PR adds the argument lora_bias which, if set to True (default: False), adds a bias term to the LoRA B module.

Typically, this should be disabled. The main use case is when the LoRA weights were extracted from fully fine-tuned parameters, so the bias of those parameters can be taken into account.

Merging is supported for this argument when using vanilla LoRA layers or bitsandbytes LoRA layers. Other types of LoRA layers don't support merging.

This option is also disabled for non-standard LoRA weight initialization like LoftQ, as well as for embedding layers (since they use nn.Parameter, i.e. there is no bias term).

Notes:

While working on this, I noticed that some of the quantized LoRA layers do not correctly check for DoRA. This is now added.
If this PR is merged, we need updates to the MHA and the quanto PR to take the changes into account.
I added tests for merging with bitsandbytes. However, since merging into quantized layers is imprecise, requiring a high tolerance, these tests would pass even if the bias was omitted while merging. I'm still fairly certain that it is implemented correctly, since it's essentially the same code as for vanilla LoRA.

This PR adds the argument lora_bias which, if set to True (default: False), adds a bias term to the LoRA B module. Typically, this should be disabled. The main use case is when the LoRA weights were extracted from fully fine-tuned parameters, so the bias of those parameters can be taken into account. Merging is supported for this argument when using vanilla LoRA layers or bitsandbytes LoRA layers. Other types of LoRA layers don't support merging. This option is also disabled for non-standard LoRA weight initialization like LoftQ, as well as for embedding layers (since they use nn.Parameter).

HuggingFaceDocBuilderDev · 2024-11-26T13:54:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks for getting this ready so quickly!

My major comments are:

Should consider the argument to be lora_B_bias as opposed to lora_bias to be more informative at the expense two extra characters?
There seems to be a logical error when updating the bias value (as indicated in the comments).
I didn't notice if we're raising any errors when unsupported configurations (such as the ones described in the OP) are detected. Maybe we should check that (if not already) and test?

sayakpaul · 2024-11-27T01:53:56Z

src/peft/tuners/lora/aqlm.py

@@ -35,13 +35,27 @@ def __init__(
        lora_dropout: float = 0.0,
        init_lora_weights: bool = True,
        use_rslora: bool = False,
+        use_dora: bool = False,
+        lora_bias: bool = False,


Should this be lora_B_bias? I find that to be a bit more informative.

Indeed I considered this. My main reasoning for going with the more generic lora_bias was that it leaves the door open for extending this argument in the future. Say, someone finds that LoRA works much better when also adding a bias to LoRA A, then we can adopt this argument to allow this too. Otherwise, we'd have to add a new argument (and we don't want to rename arguments for obvious reasons). LMK what you think of that reasoning.

Otherwise, we'd have to add a new argument (and we don't want to rename arguments for obvious reasons).

I think that would still be preferrable over having a single argument for controlling the bias setup for LoRAs as I think it's still in its infancy.

Later it if it becomes a common standard to add biases for both LoRA matrices we can deprecate lora_B_bias and lora_A_bias (if we introduce such an argument) to have a single argument called lora_bias.

This is where I stand, but I am not too opinionated about it.

Do we care about reproducibility after upgrading PEFT? Then it seems detrimental to possibly merge control of A and B biases into one flag in the future and they should be separated into two flags from the start.

Otherwise, I think in terms of opportunity cost for experimentation on the user's side having two separate parameters (lora_bias_A, lora_bias_B) is better. That said, having only one parameter appears to be simpler: let the implementation decide what the current best thing is for adding biases. So if you are just someone who wants to do LoRA best-current-practice it would be helpful to only have one flag. This becomes harder with two flags since there is no obvious 'no bias at all' vs. 'best-practice' setting. If we have simplicity first (and don't care about reproducibility after upgrading) then one parameter is the way to go, I think. What's the stance here?

Ideally there would be another layer of abstraction, a more low-level abstraction, that has two bias parameters and one above that which decides what the best choice is at the moment. I.e. BaseLoRA(..., lora_bias_A, lora_bias_B) -> LoRA(..., lora_bias) .

To clarify, my idea is that if we want to later add the possibility for a bias for LoRA A, the option would be something like lora_bias="a", or for both, lora_bias="both". We should not change the meaning of lora_bias=True, in order to ensure reproducibility, as you mentioned.

If we find that the parameter gets overloaded, we can add the option for a sub-config, so LoraConfig(..., lora_bias=LoraBiasConfig(bias_a=True, bias_b=True, ...)).

Seems like lora_bias should be fine for now.

Thanks for the feedback, I merged the PR as is.

src/peft/tuners/lora/bnb.py

src/peft/tuners/lora/config.py

src/peft/tuners/lora/layer.py

sayakpaul · 2024-11-27T02:03:42Z

tests/test_common_gpu.py

+        atol = 0.01
+        rtol = 10
+        assert not torch.allclose(out_base, out_before_merge, atol=atol, rtol=rtol)
+        assert torch.allclose(out_before_merge, out_after_merge, atol=atol, rtol=rtol)


Could this be the reason as to why we need a high tolerance?

No, we already had the high tolerance before:

peft/tests/test_common_gpu.py

Lines 848 to 849 in d13d7a4

atol = 0.01

rtol = 10

I tried fiddling with these values to find something where the test was pass with the correct merging implementation and fail with the LoRA bias merging being commented out but couldn't find values that would fit. Maybe there are very narrow values that would work but if the tolerance is too narrow, the test could be unreliable.

BenjaminBossan

Thanks for the review.

I didn't notice if we're raising any errors when unsupported configurations (such as the ones described in the OP) are detected. Maybe we should check that (if not already) and test?

I added checks to the quantized layers if they support merging, as it's only relevant there, so there is not a check for each one:

As for incompatible configs, I tested them here:

https://github.com/huggingface/peft/pull/2237/files#diff-df9ecc7077bee932f56e76161ada47693d73acd3ed175a5b9a9158cfe03ec381R1164-R1181

Does this answer your question?

BenjaminBossan · 2024-11-27T09:54:51Z

tests/test_common_gpu.py

+        atol = 0.01
+        rtol = 10
+        assert not torch.allclose(out_base, out_before_merge, atol=atol, rtol=rtol)
+        assert torch.allclose(out_before_merge, out_after_merge, atol=atol, rtol=rtol)


No, we already had the high tolerance before:

peft/tests/test_common_gpu.py

Lines 848 to 849 in d13d7a4

atol = 0.01

rtol = 10

I tried fiddling with these values to find something where the test was pass with the correct merging implementation and fail with the LoRA bias merging being commented out but couldn't find values that would fit. Maybe there are very narrow values that would work but if the tolerance is too narrow, the test could be unreliable.

src/peft/tuners/lora/layer.py

src/peft/tuners/lora/config.py

src/peft/tuners/lora/bnb.py

BenjaminBossan · 2024-11-27T10:01:29Z

src/peft/tuners/lora/aqlm.py

@@ -35,13 +35,27 @@ def __init__(
        lora_dropout: float = 0.0,
        init_lora_weights: bool = True,
        use_rslora: bool = False,
+        use_dora: bool = False,
+        lora_bias: bool = False,


Indeed I considered this. My main reasoning for going with the more generic lora_bias was that it leaves the door open for extending this argument in the future. Say, someone finds that LoRA works much better when also adding a bias to LoRA A, then we can adopt this argument to allow this too. Otherwise, we'd have to add a new argument (and we don't want to rename arguments for obvious reasons). LMK what you think of that reasoning.

sayakpaul

Thanks! This looks good to me and thanks one again for getting this up so quickly!

BenjaminBossan added 2 commits November 26, 2024 15:27

Merge branch 'main' into enh-lora-initialization-with-lora-b-bias

b397352

Fix failing Eva test

95e32e2

BenjaminBossan requested a review from sayakpaul November 26, 2024 16:43

sayakpaul mentioned this pull request Nov 27, 2024

Flux Control LoRA huggingface/diffusers#9999

Draft

sayakpaul reviewed Nov 27, 2024

View reviewed changes

Reviewer feedback: Fix bug in merging code

0b7590c

BenjaminBossan commented Nov 27, 2024

View reviewed changes

sayakpaul approved these changes Nov 27, 2024

View reviewed changes

BenjaminBossan merged commit 943daf1 into huggingface:main Nov 27, 2024
14 checks passed

BenjaminBossan deleted the enh-lora-initialization-with-lora-b-bias branch November 27, 2024 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Argument to enable bias for LoRA B #2237

ENH Argument to enable bias for LoRA B #2237

BenjaminBossan commented Nov 26, 2024

HuggingFaceDocBuilderDev commented Nov 26, 2024

sayakpaul left a comment

sayakpaul Nov 27, 2024

BenjaminBossan Nov 27, 2024

sayakpaul Nov 27, 2024 •

edited

Loading

githubnemo Nov 27, 2024

BenjaminBossan Nov 27, 2024

sayakpaul Nov 27, 2024

BenjaminBossan Nov 27, 2024

sayakpaul Nov 27, 2024

BenjaminBossan Nov 27, 2024

BenjaminBossan left a comment

BenjaminBossan Nov 27, 2024

BenjaminBossan Nov 27, 2024

sayakpaul left a comment

ENH Argument to enable bias for LoRA B #2237

ENH Argument to enable bias for LoRA B #2237

Conversation

BenjaminBossan commented Nov 26, 2024

HuggingFaceDocBuilderDev commented Nov 26, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Nov 27, 2024 •

edited

Loading