Skip to content

[RL] logprob compute use the same method #10596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JunnYu
Copy link
Member

@JunnYu JunnYu commented May 15, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Bug fixes

PR changes

Others

Description

问题:旧的代码在第一个step的时候,old_log_prob(actor计算时得到)、ref_log_prob(ref计算时得到)、log_prob(actor训练时候计算) 这三个值不一样,存在问题,需要修复,本PR进行了修复。
修复方法:

  1. old_log_prob和ref_log_prob缺少了 amp_auto_cast包裹,而log_prob有amp_auto_cast包裹,因此需要添加这个操作。
  2. 当开启fused head loss的时候,log_prob采用了切chunk以及fused head的形式进行计算,为了严格对齐,需要修改compute_log_prob函数,使用相同的实现方式。
  3. 开启kl loss的时候在训练1000+步数的时候会出nan,这里定位到的原因是exp(delta)溢出了,因为delta = ref_log_prob - log_prob 当中包含了padding的token,而在padding token位置可能delta值比较大,大到比如30,40这么大。因此,我们需要先对ref_log_prob以及log_prob进行mask操作,随后进行相减然后求exp,在这种情况下,不会出现溢出问题了。

新增entropy loss的打印监控,用户可以监控entropy loss值

Copy link

paddle-bot bot commented May 15, 2025

Thanks for your contribution!

Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 0% with 105 lines in your changes missing coverage. Please review.

Project coverage is 46.95%. Comparing base (6de50e6) to head (25825f9).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/rl/trainer/actor_trainer.py 0.00% 86 Missing ⚠️
paddlenlp/rl/models/ppo_model_utils.py 0.00% 11 Missing ⚠️
paddlenlp/rl/trainer/ppo_trainer.py 0.00% 8 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (46.95%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10596      +/-   ##
===========================================
- Coverage    46.98%   46.95%   -0.04%     
===========================================
  Files          799      799              
  Lines       132255   132348      +93     
===========================================
  Hits         62139    62139              
- Misses       70116    70209      +93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gongel gongel merged commit ddcb722 into PaddlePaddle:develop May 15, 2025
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants