[llm] support tensorwise fp8/int8 training #10612

lugimzzz · 2025-05-19T02:56:21Z

PR types

New features

PR changes

APIs

Description

新增支持功能：
1.新增权重scale和激活scale all_reduce_max,以支持不同TP和数据并行策略切分
2. 支持DP+TP+PP+Sharding stage1训练FP8/INT8训练，使用Unified Checkpoint对权重、optimizer存储
3. 哈达玛矩阵乘改用对角block 哈达玛矩阵
4. 统一FP8/INT8训练代码逻辑
5. 新增支持Triton版本FP8权重AdamW优化器（含bf16 moment和offload功能）
6. 支持主干模型FP8/INT8 LoRA

后续PR待支持功能：
1.目前FP8权重使用paddle.int8表示np.int8存储，后续修改为float8表示（待框架支持fp8 set_value和concat）
2. 对FP8/INT8 quant-matmul-dequant 过程进行性能加速和对Moe结构进行加速适配
3.FP8/INT8训练支持Sharding stage2/3（PP仅支持stage1 优先级不高）

paddle-bot · 2025-05-19T02:56:26Z

Thanks for your contribution!

codecov · 2025-05-19T03:31:14Z

Codecov Report

Attention: Patch coverage is 16.57895% with 317 lines in your changes missing coverage. Please review.

Project coverage is 46.90%. Comparing base (c309aa7) to head (7935b87).
Report is 23 commits behind head on develop.

❗ Current head 7935b87 differs from pull request most recent head 1b95f02

Please upload reports for the commit 1b95f02 to get more accurate results.

Files with missing lines	Patch %	Lines
paddlenlp/quantization/qat_utils.py	7.69%	72 Missing ⚠️
paddlenlp/utils/optimizer.py	7.69%	72 Missing ⚠️
paddlenlp/utils/adamw_triton.py	13.84%	56 Missing ⚠️
paddlenlp/transformers/conversion_utils.py	9.37%	29 Missing ⚠️
paddlenlp/transformers/model_utils.py	21.62%	29 Missing ⚠️
paddlenlp/quantization/quantization_linear.py	10.34%	26 Missing ⚠️
paddlenlp/quantization/hadamard_utils.py	16.66%	25 Missing ⚠️
paddlenlp/quantization/quantization_utils.py	25.00%	3 Missing ⚠️
paddlenlp/trainer/trainer.py	60.00%	2 Missing ⚠️
paddlenlp/utils/distributed.py	0.00%	2 Missing ⚠️
... and 1 more

❌ Your patch status has failed because the patch coverage (16.57%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (46.90%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10612      +/-   ##
===========================================
- Coverage    46.91%   46.90%   -0.01%     
===========================================
  Files          799      800       +1     
  Lines       132457   132519      +62     
===========================================
+ Hits         62136    62157      +21     
- Misses       70321    70362      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lugimzzz · 2025-05-21T05:57:19Z

paddlenlp/transformers/model_utils.py

+                    "weight_only_int4",
+                    "weight_only_int8",
+                ]
+            elif isinstance(config.quantization_config.weight_quantize_algo, dict):


weight_only_int8不支持不同TP分片共享同一个scale，暂不支持wint8权重灵活转化TP策略

post_quantize 代表先TP切分权重再量化（针对wint4/wint8）

lugimzzz · 2025-05-21T05:58:35Z

paddlenlp/transformers/model_utils.py

            # load pt weights early so that we know which dtype to init the model under
        if not is_sharded and state_dict is None:
            # 4. loading non-sharded ckpt from the state dict
+            # Quantization: Loading non-sharded ckpt does not support saving with merge_tensor_parallel


暂时不考虑非safetensor权重的量化加载和保存

nepeplwu · 2025-05-23T09:22:36Z

paddlenlp/quantization/hadamard_utils.py

+    return block
+
+
+def create_hadamard_matrix(block_size, dtype):


和前面random_hadamard_matrix的区别是什么

nepeplwu · 2025-05-23T09:25:07Z

paddlenlp/quantization/hadamard_utils.py

+    if getattr(infohub, "hadamard") is None:
+        setattr(infohub, "hadamard", {})
+
+    if block_size in infohub.hadamard:


hadamard_matrix 没有默认值的话，没有命中该分支会出问题

infohub.hadamard 默认值是{}

nepeplwu · 2025-05-23T09:31:54Z

paddlenlp/trainer/trainer.py


            optimizer_cls = AdamWCustom
            optimizer_kwargs.update(adam_kwargs)
-        elif args.optim == OptimizerNames.ADAMW_16BIT_MOMENT:


为什么需要去掉这两种adamw实现

和之前的重复了，现在整合成一个AdamWCustom能满足所有功能

nepeplwu · 2025-05-23T09:32:35Z

paddlenlp/trainer/trainer_utils.py

    ADAFACTOR = "adafactor"
    ADAMW_MINI = "adamw_mini"
    ADAMW_CUSTOM = "adamw_custom"
-    ADAMW_16BIT_MOMENT = "adamw_16bit_moment"


这两种在以往的配置中没有被使用到吗？

这个现在都被统一成adamw_custom的实现，所以其他两种可以删除

nepeplwu · 2025-05-23T09:50:45Z

paddlenlp/trainer/training_args.py

    )
+    use_lowprecision_moment: bool = field(
+        default=False,
+        metadata={"help": "AdamW use lowbit moment as parameter."},


这里的lowbit是指多少位，什么情况下建议开启，开启后的影响是什么，需要明确解释下

lowprecision_moment是指16bit，原来默认是32bit。我在meta_data加一下内容

nepeplwu · 2025-05-23T09:51:12Z

paddlenlp/trainer/training_args.py

        default=False,
        metadata={"help": "Offload optimizer after optimizer.step()"},
    )
+    tensorwise_offload_optimizer: Optional[bool] = field(


help信息没解释清楚，为什么需要这个

minghaoBD · 2025-05-23T11:29:22Z

llm/run_finetune.py

    )
-    trainable_parameters = [p for p in model.parameters() if not p.stop_gradient]
+    trainable_parameters = [
+        p for p in model.parameters() if not p.stop_gradient or ("quantization_linear" in p.name and "w_1" in p.name)


这里的hardcode可以避免吗？或者如何保证一定生效？至少需要有log提示

暂时没有更好的写法，因为scale是stop_gradient，但需要传入optimizer的参数

minghaoBD · 2025-05-23T11:46:02Z

paddlenlp/quantization/quantization_linear.py

+                if self.weight_quantize_algo not in ["fp8linear", "a8w4linear", "fp8linear"]:
+                    self.quant_scale.is_distributed = False
+                else:
+                    self.quant_scale.is_distributed = True if self.is_mp else False


要考虑DP吗？

minghaoBD · 2025-05-23T12:11:27Z

paddlenlp/quantization/qat_utils.py

+                scale = paddle.max(paddle.abs(target_x)) / qmax
+                if group is not None:
+                    paddle.distributed.all_reduce(scale, op=paddle.distributed.ReduceOp.MAX, group=group, sync_op=True)
+                if state < quantization_config.apply_online_actscale_step:


这里的online scaling是和delayed scaling对应的吗？不知道这个参数影响了什么？建议给用户解释下

lugimzzz added 4 commits May 16, 2025 17:16

support uc

43ec0ac

add hadamard

fe356c3

add hadamard

0fbc564

Merge branch 'new' of https://github.com/lugimzzz/PaddleNLP into new

2e70aad

lugimzzz added 5 commits May 19, 2025 21:34

add distributed

fb0224a

add distributed

9a4f89e

add distributed

925a532

add new

336eb31

add offload optimizer

d718d40

lugimzzz changed the title ~~add uc~~ [llm] support tensorwise fp8 training May 21, 2025

lugimzzz commented May 21, 2025

View reviewed changes

lugimzzz changed the title ~~[llm] support tensorwise fp8 training~~ [llm] support tensorwise fp8/int8 training May 21, 2025

lugimzzz added 8 commits May 21, 2025 21:27

support moe model

1bfb4d9

support dp

587cecc

support dp

fc1d2c4

support uc

c158ab7

support uc

7e60cbf

support dp all_reduce

becedb7

fix quantiztaion config

ed78605

fix quantiztaion config

7935b87

nepeplwu reviewed May 23, 2025

View reviewed changes

minghaoBD reviewed May 23, 2025

View reviewed changes

lugimzzz added 5 commits May 30, 2025 16:39

following comments

ce845e8

fix conflict

2ab3101

fix conflict

97d37dd

fix conflict

6fd3312

fix conflict

779db3e

Merge branch 'develop' of https://github.com/lugimzzz/PaddleNLP into new

1b95f02

nepeplwu merged commit ad548d3 into PaddlePaddle:develop Jun 5, 2025
8 of 10 checks passed

lugimzzz deleted the new branch June 9, 2025 12:59

[llm] support tensorwise fp8/int8 training #10612

[llm] support tensorwise fp8/int8 training #10612

Uh oh!

Conversation

lugimzzz commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented May 19, 2025

Uh oh!

codecov bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lugimzzz commented May 19, 2025 •

edited

Loading

codecov bot commented May 19, 2025 •

edited

Loading