[BUG]: RuntimeError: Failed to replace input_layernorm of type DeepseekV3RMSNorm with FusedRMSNorm with the exception: 'NoneType' object is not callable. #6260
Labels
bug
Something isn't working
Is there an existing issue for this bug?
The bug has not been fixed in the latest main branch
Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)
Yes, I will share a minimal reproducible script.
🐛 Describe the bug
按照readme教程,lora微调deepseek r1 时遇到这样的bug。好像是colossalai库的问题
Traceback (most recent call last):
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 197, in _replace_sub_module
[rank5]: replace_layer = target_module.from_native_module(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py", line 333, in from_native_module
[rank5]: rmsnorm = FusedRMSNormWithHook(
[rank5]: TypeError: 'NoneType' object is not callable
[rank5]: During handling of the above exception, another exception occurred:
[rank5]: Traceback (most recent call last):
[rank5]: File "/home/ds-r1/ColossalAI/applications/ColossalChat/examples/training_scripts/lora_finetune.py", line 464, in
[rank5]: train(args)
[rank5]: File "/home/ds-r1/ColossalAI/applications/ColossalChat/examples/training_scripts/lora_finetune.py", line 261, in train
[rank5]: model, optimizer, _, dataloader, lr_scheduler = booster.boost(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/booster/booster.py", line 154, in boost
[rank5]: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/booster/plugin/moe_hybrid_parallel_plugin.py", line 457, in configure
[rank5]: model = HybridParallelModule(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 87, in init
[rank5]: module, self.shared_params = shardformer.optimize(module, policy=custom_policy)
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/shardformer.py", line 55, in optimize
[rank5]: shared_params = sharder.shard()
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 43, in shard
[rank5]: self._replace_module(include=held_layers)
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 67, in _replace_module
[rank5]: self._recursive_replace_layer(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
[rank5]: self._recursive_replace_layer(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
[rank5]: self._recursive_replace_layer(
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 115, in _recursive_replace_layer
[rank5]: self._recursive_replace_layer(
[rank5]: [Previous line repeated 2 more times]
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 112, in _recursive_replace_layer
[rank5]: self._replace_sub_module(module, sub_module_replacement, include)
[rank5]: File "/home/conda/envs/colo/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 201, in _replace_sub_module
[rank5]: raise RuntimeError(
[rank5]: RuntimeError: Failed to replace input_layernorm of type DeepseekV3RMSNorm with FusedRMSNorm with the exception: 'NoneType' object is not callable. Please ch
eck your model configuration or sharding policy, you can set up an issue for us to help you as well.
Environment
No response
The text was updated successfully, but these errors were encountered: