Skip to content

Add RoBERTa FP8 support with refactoring #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 3, 2025

Conversation

andrea-fasoli
Copy link
Collaborator

This PR introduces launch scripts and utility functions for FP8 encoders inference: RoBERTa for QuestionAnswering or MaskedLM architecture and tasks are supported.

The PR incorporates several aspects of the ongoing refactoring process for INT8 LLM from #34 (including addressing the issues raises during that review process), but limits updates to encoder models (i.e., no changes to LLM nor inference.py) and focuses on FP8 quantization instead. It assumes support for FP8 has been incorporated in FMS and FMS-MO (ongoing effort).

Main launch script for encoders is scripts/encoders_inference.py
Utility for various setup steps and inference are:

aiu_setup               set up AIU environment variables
args_parsing            define script arguments across all model configurations
encoders_utils          classes to run QA and MaskedLM tasks
model_setup             define model dtype, device, and distributed strategy
quantization_setup      import FMS-MO addons and define linear_config if needed

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
@andrea-fasoli
Copy link
Collaborator Author

cc: @ani300 @JRosenkranz

@ani300 ani300 requested review from ani300 and JRosenkranz July 1, 2025 20:30
Comment on lines 60 to 61
if local_size < 0:
local_size = world_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll eventually need local_rank (when multi-aiu moves to multi-node multi-aiu)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point to keep in mind. Should we add it back at that time or now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally torchrun already provides it, so we might as well keep it even if it's the same as RANK for now. Some models/algorithms might even expect to use that instead of rank, so it's good to have both

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, local_rank and local_size are not in use right now but we keep them in the setup for future needs

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
max_prompt_length = (
args.max_prompt_length
if args.max_prompt_length is not None
else 384
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we get this from the model itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the hardcoding either but I think the 384 limit is a task-related (QA) default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking for that in the config.json for Roberta, but the limit there is 514. Let's add a comment that this limit is QA related

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Copy link
Contributor

@ani300 ani300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a few comments, but I'm optimistic about merging this tomorrow

@ani300
Copy link
Contributor

ani300 commented Jul 1, 2025

Can we also remove the current roberta example script in FMS? I think this supersedes it.

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
@andrea-fasoli
Copy link
Collaborator Author

do we like encoders_inference.py as a name for the entry point script?

@andrea-fasoli
Copy link
Collaborator Author

andrea-fasoli commented Jul 1, 2025

the pre- and post-processing functions of the QA task are based on torch QA example. I am sure they can be streamlined (as a non-urgent item). If there's any concern with this, let me know.

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
@andrea-fasoli
Copy link
Collaborator Author

do we need any updates to tests/models/test_encoder.py?

@ani300
Copy link
Contributor

ani300 commented Jul 2, 2025

do we like encoders_inference.py as a name for the entry point script?

I like run_encoders.py better, and it matches test_encoders

the pre- and post-processing functions of the QA task are based on torch QA example. I am sure they can be streamlined (as a non-urgent item). If there's any concern with this, let me know.

Preprocessing needs to run on every process if multi-AIU, but the post-processing should only run on a single rank (unless we want to ensure all ranks get the same output?) Maybe keep it as is as it makes it easier to debug.

do we need any updates to tests/models/test_encoder.py?

I think the tests do what they need already, which is to ensure the model outputs stay consistent over time using the model signatures framework (which comes from FMS). I don't know if aiu-fms-testing-utils is the place to have validation/e2e tests for specific tasks, @dpatel-ops what's your opinion?

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Copy link
Contributor

@ani300 ani300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ani300 ani300 merged commit 8d703be into foundation-model-stack:main Jul 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants