-
Notifications
You must be signed in to change notification settings - Fork 370
Moe support #3811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Moe support #3811
Conversation
22edfe6
to
fa53986
Compare
fa53986
to
6ea89ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add detailed comments demonstrating what it does now that you've gone through the entire converter ? that would be helpful
model = model.to(torch.float32) | ||
|
||
return model | ||
return model.cuda() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.cuda() is already done during initialization
# TODO: @Evan is waiting for TRT's feature to cache the weight-stripped engine | ||
# if not self.compilation_settings.strip_engine_weights: | ||
# # set EXCLUDE_WEIGHTS flag to strip weights | ||
# runtime = trt.Runtime(TRT_LOGGER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these being deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to deserialize the engine anymore. The engine is already live
assert isinstance( | ||
serialized_engine, bytes | ||
), "Serialized engine must be a bytes object" | ||
self.engine = serialized_engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserialize the engine here
|
||
return TRTInterpreterResult( | ||
engine_str, | ||
cuda_engine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work with the deferred engine setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think this is not clearly implemented. If we decided that PyRuntime will use initialized engines, then you should quickly reduce other cases to an initialized engine.
assert isinstance( | ||
serialized_engine, bytes | ||
), "Serialized engine must be a bytes object" | ||
self.engine = serialized_engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do not overload the meaning of self.engine
. Either it should be the serialized engine or the live engine not both
|
||
def setup_engine(self) -> None: | ||
|
||
if isinstance(self.engine, trt.ICudaEngine): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think about how this will interact with the greater system like the lazy_engine_init
option.
|
||
if isinstance(self.engine, trt.ICudaEngine): | ||
pass | ||
elif isinstance(self.engine, bytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not check by type, have two members and a state flag
|
||
def _on_state_dict(self, state_dict: Dict[str, Any], prefix: str, _: Any) -> None: | ||
state_dict[prefix + "engine"] = self.serialized_engine | ||
state_dict[prefix + "engine"] = self.engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, this should just be serialized engine
error_msgs: Any, | ||
) -> None: | ||
self.serialized_engine = state_dict[prefix + "engine"] | ||
self.engine = state_dict[prefix + "engine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^
|
||
def __init__( | ||
self, | ||
cuda_engine: Optional[trt.ICudaEngine | bytes] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be a union. Its either you give a cuda_engine or a serialized engine
9e7ca5d
to
c286767
Compare
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: