Skip to content

Conversation

chhzh123
Copy link
Collaborator

This PR fixes #241 by creating a lazy_init function to prevent initializing random weights when creating the model at the very beginning.

Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chhzh123 chhzh123 requested a review from Prayer3th October 15, 2025 01:59
@chhzh123
Copy link
Collaborator Author

cc @Kipsora

@chhzh123 chhzh123 force-pushed the deferred_weight_loading branch from 94a683f to 805cc56 Compare October 16, 2025 04:28
@chhzh123
Copy link
Collaborator Author

Memory usage and latency comparison with and without lazy init:

Screenshot 2025-10-16 at 12 16 21 AM

@Prayer3th
Copy link
Collaborator

Prayer3th commented Oct 20, 2025

We have now switched to using nnx.eval_shape for model initialization. This approach eliminates the need to load placeholder tensors from init entirely, and the code modifications do not interfere with the forward processing logic. Therefore, this PR will be closed. For details on the nnx.eval_shape changes, see #248.

@Prayer3th Prayer3th closed this Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Deferred Weight Initialization

2 participants