Skip to content

Conversation

BenjaminBossan
Copy link
Member

Description

This PR fixes a few issues with the handling of active adapters for auxiliary modules.

1. Calling set_adapter on the model.base_model

When calling peft_model.set_adapter, it is not possible to activate more than one adapter, as not all PEFT methods support that. However, many PEFT methods like LoRA do, in which case users should call peft_model.base_model.set_adapter(['default', 'other']).

Now the issue was that the activation of auxiliary modules was only done on PeftModel.set_adapter. This means that if users are calling peft_model.base_model.set_adapter (i.e. LoraModel.set_adapter etc.), the auxiliary adapters were not activated.

This PR fixes this issue by ensuring that even if the user activates adapters like this, the auxiliary modules are activated. When users activate more than one adapter, additional checks are performed to ensure that they are not activating multiple auxiliary modules on the same module.

Note that some existing PEFT code could start raising errors now because of the change. However, this PEFT code is buggy right now so IMO it is fine to raise an error.

2. Adding multiple adapters with non-overlapping auxiliary modules

Furthermore, I found an activation issue that could occur when adding multiple adapters with non-overlapping auxiliary modules. Normally, when the second/third/... adapter are being added, they are not automatically activated. However, when these additional adapters target new auxiliary modules, those would be incorrectly activated (because they look like they're the first adapter). This has also been fixed.

Open question

Right now, we don't allow users to activate multiple auxiliary adapters on the same module. However, this limitation could be considered too strict:

  1. For trainable tokens, as long as the indices don't overlap, there is no conflict.
  2. For modules_to_save, we could theoretically determine the "delta_weight" as new_weight - original_weight, then add up all delta_weights.

This is not implemented in the PR for now to prevent it becoming even more complex.

Note to maintainers

This PR requires the addition of self.set_auxiliary_adapters to the set_adapters method of all non-prompt learning PEFT method model classes. If this PR is merged, all open PRs that add such PEFT methods must be updated to include this call.

Description

This PR fixes a few issues with the handling of active adapters for
auxiliary modules.

1. Calling set_adapter on the base_model

When calling peft_model.set_adapter, it is not possible to activate more
than one adapter, as not all PEFT methods support that. However, many
PEFT methods like LoRA do, in which case users should call
peft_model.base_model.set_adapter(['default', 'other']).

Now the issue was that the activation of auxiliary modules was only done
on PeftModel.set_adapter. This means that if users are calling
peft_model.base_model.set_adapter (i.e. LoraModel.set_adapter etc.), the
auxiliary adapters were not activated.

This PR fixes this issue by ensuring that even if the user activates
adapters like this, the auxiliary modules are activated. When users
activate more than one adapter, additional checks are performed to
ensure that they are not activating multiple auxiliary modules on the
same module.

Note that some existing PEFT code could start raising errors now because
of the change. However, this PEFT code is buggy right now so IMO it is
fine to raise an error.

2. Adding multiple adapters with non-overlapping auxiliary modules

Furthermore, I found an activation issue that could occur when adding
multiple adapters with non-overlapping auxiliary modules. Normally, when
the second/third/... adapter are being added, they are not automatically
activated. However, when there when these additional adapters target new
auxiliary modules, those would be incorrectly activated (because they
look like they're the first adapter). This has also been fixed.

Open question

Right now, we don't allow users to activate multiple auxiliary adapters
on the same module. However, this limitation could be considered too
strict:

1. For trainable tokens, as long as the indices don't overlap, there is
no conflict.
2. For modules_to_save, we could theoretically determine the
delta_weight as new_weight - original_weight, then add up all
delta_weights.

This is not implemented in the PR for now to prevent it becoming even
more complex.

Note to maintainers

This PR requires the addition of self.set_auxiliary_adapters to the
set_adapters method of all non-prompt learning PEFT method model
classes. If this PR is merged, all open PRs that add such PEFT methods
must be updated to include this call.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, thanks.

Right now, we don't allow users to activate multiple
auxiliary adapters on the same module. However, this limitation could be considered too strict:

The merging behavior you describe (summation of delta weights) can be done for overlapping trainable tokens and modules to save alike. I think there are at least three cases here:

  • user just wants to load adapters and if they're overlapping, receive an error because the assumed orthogonality of the adapters is not given
  • user wants to override specific parts of a pre-trained adapter and wants to load another adapter to overwrite specific parts (without merging)
  • user is aware of the overlap between adapters and wants to merge the deltas in some way

Ideally we would support all cases (defaulting to the first case) with additional arguments to add_adapter but I'm not yet sure what this means for other places that use add_adapter and how to propagate such an option through possible abstractions.

I would see these as out of scope for this PR and I don't see anything anything blocking a possible implementation.

Comment on lines 903 to 919
if isinstance(module, TrainableTokensWrapper):
# TODO In theory, multiple active trainable tokens is fine when the indices don't overlap
adapter_names_in_module = [n for n in adapter_name if n in module.token_adapter.trainable_tokens_delta]
elif isinstance(module, ModulesToSaveWrapper):
adapter_names_in_module = [n for n in adapter_name if n in module.modules_to_save]
else:
raise NotImplementedError(f"Please implement the handling for {type(module)}")

if len(adapter_names_in_module) > 1:
raise ValueError(f"Only one adapter can be set at a time for {module}, got {len(adapter_names_in_module)}")

if not adapter_names_in_module:
adapter_name_to_set = None
else:
adapter_name_to_set = adapter_names_in_module[0]

return adapter_name_to_set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should delegate the check (and potential checking of capabilities like merging given a method - out of scope but forseeable) to the wrapper classes themselves, e.g. AuxiliaryTrainingWrapper.check_new_adapter(). WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I refactored it.

Copy link
Member Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merging behavior you describe (summation of delta weights) can be done for overlapping trainable tokens and modules to save alike. I think there are at least three cases here:

  • user just wants to load adapters and if they're overlapping, receive an error because the assumed orthogonality of the adapters is not given
  • user wants to override specific parts of a pre-trained adapter and wants to load another adapter to overwrite specific parts (without merging)
  • user is aware of the overlap between adapters and wants to merge the deltas in some way

Ideally we would support all cases (defaulting to the first case) with additional arguments to add_adapter but I'm not yet sure what this means for other places that use add_adapter and how to propagate such an option through possible abstractions.

Good summary, although I'd say that case 2 (override) would be surprising behavior and would be fine not supporting it. I think the expectation for a user would be that this behaves the same as for LoRA et al., i.e. the effects are aggregated (whatever that may mean exactly).

I would see these as out of scope for this PR and I don't see anything anything blocking a possible implementation.

So as is, it's not allowed and I think we both agree that suggestion 3. is the most useful one. Let's implement that in a future PR.

Comment on lines 903 to 919
if isinstance(module, TrainableTokensWrapper):
# TODO In theory, multiple active trainable tokens is fine when the indices don't overlap
adapter_names_in_module = [n for n in adapter_name if n in module.token_adapter.trainable_tokens_delta]
elif isinstance(module, ModulesToSaveWrapper):
adapter_names_in_module = [n for n in adapter_name if n in module.modules_to_save]
else:
raise NotImplementedError(f"Please implement the handling for {type(module)}")

if len(adapter_names_in_module) > 1:
raise ValueError(f"Only one adapter can be set at a time for {module}, got {len(adapter_names_in_module)}")

if not adapter_names_in_module:
adapter_name_to_set = None
else:
adapter_name_to_set = adapter_names_in_module[0]

return adapter_name_to_set
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I refactored it.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@BenjaminBossan BenjaminBossan merged commit a3197b1 into huggingface:main Aug 29, 2025
14 checks passed
@BenjaminBossan BenjaminBossan deleted the fix-issue-multiple-active-auxiliary-modules branch August 29, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants