-
Notifications
You must be signed in to change notification settings - Fork 16
Add RoBERTa FP8 support with refactoring #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
cc: @ani300 @JRosenkranz |
if local_size < 0: | ||
local_size = world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll eventually need local_rank (when multi-aiu moves to multi-node multi-aiu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point to keep in mind. Should we add it back at that time or now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally torchrun
already provides it, so we might as well keep it even if it's the same as RANK for now. Some models/algorithms might even expect to use that instead of rank, so it's good to have both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, local_rank
and local_size
are not in use right now but we keep them in the setup for future needs
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
max_prompt_length = ( | ||
args.max_prompt_length | ||
if args.max_prompt_length is not None | ||
else 384 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we get this from the model itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the hardcoding either but I think the 384 limit is a task-related (QA) default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was looking for that in the config.json for Roberta, but the limit there is 514. Let's add a comment that this limit is QA related
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a few comments, but I'm optimistic about merging this tomorrow
Can we also remove the current roberta example script in FMS? I think this supersedes it. |
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
do we like |
the pre- and post-processing functions of the QA task are based on torch QA example. I am sure they can be streamlined (as a non-urgent item). If there's any concern with this, let me know. |
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
do we need any updates to |
I like
Preprocessing needs to run on every process if multi-AIU, but the post-processing should only run on a single rank (unless we want to ensure all ranks get the same output?) Maybe keep it as is as it makes it easier to debug.
I think the tests do what they need already, which is to ensure the model outputs stay consistent over time using the model signatures framework (which comes from FMS). I don't know if aiu-fms-testing-utils is the place to have validation/e2e tests for specific tasks, @dpatel-ops what's your opinion? |
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This PR introduces launch scripts and utility functions for FP8 encoders inference: RoBERTa for QuestionAnswering or MaskedLM architecture and tasks are supported.
The PR incorporates several aspects of the ongoing refactoring process for INT8 LLM from #34 (including addressing the issues raises during that review process), but limits updates to encoder models (i.e., no changes to LLM nor
inference.py
) and focuses on FP8 quantization instead. It assumes support for FP8 has been incorporated in FMS and FMS-MO (ongoing effort).Main launch script for encoders is
scripts/encoders_inference.py
Utility for various setup steps and inference are: