-
Notifications
You must be signed in to change notification settings - Fork 2k
FIX: Multiple active adapters with auxiliary layers #2758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Multiple active adapters with auxiliary layers #2758
Conversation
Description This PR fixes a few issues with the handling of active adapters for auxiliary modules. 1. Calling set_adapter on the base_model When calling peft_model.set_adapter, it is not possible to activate more than one adapter, as not all PEFT methods support that. However, many PEFT methods like LoRA do, in which case users should call peft_model.base_model.set_adapter(['default', 'other']). Now the issue was that the activation of auxiliary modules was only done on PeftModel.set_adapter. This means that if users are calling peft_model.base_model.set_adapter (i.e. LoraModel.set_adapter etc.), the auxiliary adapters were not activated. This PR fixes this issue by ensuring that even if the user activates adapters like this, the auxiliary modules are activated. When users activate more than one adapter, additional checks are performed to ensure that they are not activating multiple auxiliary modules on the same module. Note that some existing PEFT code could start raising errors now because of the change. However, this PEFT code is buggy right now so IMO it is fine to raise an error. 2. Adding multiple adapters with non-overlapping auxiliary modules Furthermore, I found an activation issue that could occur when adding multiple adapters with non-overlapping auxiliary modules. Normally, when the second/third/... adapter are being added, they are not automatically activated. However, when there when these additional adapters target new auxiliary modules, those would be incorrectly activated (because they look like they're the first adapter). This has also been fixed. Open question Right now, we don't allow users to activate multiple auxiliary adapters on the same module. However, this limitation could be considered too strict: 1. For trainable tokens, as long as the indices don't overlap, there is no conflict. 2. For modules_to_save, we could theoretically determine the delta_weight as new_weight - original_weight, then add up all delta_weights. This is not implemented in the PR for now to prevent it becoming even more complex. Note to maintainers This PR requires the addition of self.set_auxiliary_adapters to the set_adapters method of all non-prompt learning PEFT method model classes. If this PR is merged, all open PRs that add such PEFT methods must be updated to include this call.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, thanks.
Right now, we don't allow users to activate multiple
auxiliary adapters on the same module. However, this limitation could be considered too strict:
The merging behavior you describe (summation of delta weights) can be done for overlapping trainable tokens and modules to save alike. I think there are at least three cases here:
- user just wants to load adapters and if they're overlapping, receive an error because the assumed orthogonality of the adapters is not given
- user wants to override specific parts of a pre-trained adapter and wants to load another adapter to overwrite specific parts (without merging)
- user is aware of the overlap between adapters and wants to merge the deltas in some way
Ideally we would support all cases (defaulting to the first case) with additional arguments to add_adapter
but I'm not yet sure what this means for other places that use add_adapter
and how to propagate such an option through possible abstractions.
I would see these as out of scope for this PR and I don't see anything anything blocking a possible implementation.
src/peft/utils/other.py
Outdated
if isinstance(module, TrainableTokensWrapper): | ||
# TODO In theory, multiple active trainable tokens is fine when the indices don't overlap | ||
adapter_names_in_module = [n for n in adapter_name if n in module.token_adapter.trainable_tokens_delta] | ||
elif isinstance(module, ModulesToSaveWrapper): | ||
adapter_names_in_module = [n for n in adapter_name if n in module.modules_to_save] | ||
else: | ||
raise NotImplementedError(f"Please implement the handling for {type(module)}") | ||
|
||
if len(adapter_names_in_module) > 1: | ||
raise ValueError(f"Only one adapter can be set at a time for {module}, got {len(adapter_names_in_module)}") | ||
|
||
if not adapter_names_in_module: | ||
adapter_name_to_set = None | ||
else: | ||
adapter_name_to_set = adapter_names_in_module[0] | ||
|
||
return adapter_name_to_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should delegate the check (and potential checking of capabilities like merging given a method - out of scope but forseeable) to the wrapper classes themselves, e.g. AuxiliaryTrainingWrapper.check_new_adapter()
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I refactored it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merging behavior you describe (summation of delta weights) can be done for overlapping trainable tokens and modules to save alike. I think there are at least three cases here:
- user just wants to load adapters and if they're overlapping, receive an error because the assumed orthogonality of the adapters is not given
- user wants to override specific parts of a pre-trained adapter and wants to load another adapter to overwrite specific parts (without merging)
- user is aware of the overlap between adapters and wants to merge the deltas in some way
Ideally we would support all cases (defaulting to the first case) with additional arguments to
add_adapter
but I'm not yet sure what this means for other places that useadd_adapter
and how to propagate such an option through possible abstractions.
Good summary, although I'd say that case 2 (override) would be surprising behavior and would be fine not supporting it. I think the expectation for a user would be that this behaves the same as for LoRA et al., i.e. the effects are aggregated (whatever that may mean exactly).
I would see these as out of scope for this PR and I don't see anything anything blocking a possible implementation.
So as is, it's not allowed and I think we both agree that suggestion 3. is the most useful one. Let's implement that in a future PR.
src/peft/utils/other.py
Outdated
if isinstance(module, TrainableTokensWrapper): | ||
# TODO In theory, multiple active trainable tokens is fine when the indices don't overlap | ||
adapter_names_in_module = [n for n in adapter_name if n in module.token_adapter.trainable_tokens_delta] | ||
elif isinstance(module, ModulesToSaveWrapper): | ||
adapter_names_in_module = [n for n in adapter_name if n in module.modules_to_save] | ||
else: | ||
raise NotImplementedError(f"Please implement the handling for {type(module)}") | ||
|
||
if len(adapter_names_in_module) > 1: | ||
raise ValueError(f"Only one adapter can be set at a time for {module}, got {len(adapter_names_in_module)}") | ||
|
||
if not adapter_names_in_module: | ||
adapter_name_to_set = None | ||
else: | ||
adapter_name_to_set = adapter_names_in_module[0] | ||
|
||
return adapter_name_to_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I refactored it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
Description
This PR fixes a few issues with the handling of active adapters for auxiliary modules.
1. Calling
set_adapter
on themodel.base_model
When calling
peft_model.set_adapter
, it is not possible to activate more than one adapter, as not all PEFT methods support that. However, many PEFT methods like LoRA do, in which case users should callpeft_model.base_model.set_adapter(['default', 'other'])
.Now the issue was that the activation of auxiliary modules was only done on
PeftModel.set_adapter
. This means that if users are callingpeft_model.base_model.set_adapter
(i.e.LoraModel.set_adapter
etc.), the auxiliary adapters were not activated.This PR fixes this issue by ensuring that even if the user activates adapters like this, the auxiliary modules are activated. When users activate more than one adapter, additional checks are performed to ensure that they are not activating multiple auxiliary modules on the same module.
Note that some existing PEFT code could start raising errors now because of the change. However, this PEFT code is buggy right now so IMO it is fine to raise an error.
2. Adding multiple adapters with non-overlapping auxiliary modules
Furthermore, I found an activation issue that could occur when adding multiple adapters with non-overlapping auxiliary modules. Normally, when the second/third/... adapter are being added, they are not automatically activated. However, when these additional adapters target new auxiliary modules, those would be incorrectly activated (because they look like they're the first adapter). This has also been fixed.
Open question
Right now, we don't allow users to activate multiple auxiliary adapters on the same module. However, this limitation could be considered too strict:
modules_to_save
, we could theoretically determine the "delta_weight
" asnew_weight - original_weight
, then add up alldelta_weight
s.This is not implemented in the PR for now to prevent it becoming even more complex.
Note to maintainers
This PR requires the addition of
self.set_auxiliary_adapters
to theset_adapters
method of all non-prompt learning PEFT method model classes. If this PR is merged, all open PRs that add such PEFT methods must be updated to include this call.