Enabling Optimizer checkpointing for KeyValueEmbeddingFusedOptimizer #3248

Raahul46 · 2025-08-01T00:22:21Z

Summary:
Context:

We Introduced KeyValueEmbeddingFusedOptimizer for SSD Optimizer offloading
https://www.internalfb.com/code/fbsource/[a43d796a4169]/fbcode/torchrec/distributed/batched_embedding_kernel.py?lines=341-376

But, currently the optimizer weights for SSD use-cases are not offloaded and are still on HBM
Refer optimizer state dict CP:
https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/minimal_viable_ai/core/model_family_api/optimizer.py?lines=1019-1028

Due to this, we want to initialize the optimizer class for SSD that allows us to get the latest optimizer weights values during checkpointing (get_optimizer_state call):

https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py?lines=2540-2545

Hence, In this Diff:

We have made the following changes:

Loop through every embedding table
a. Change the table placement to CPU
b. Create a ShardedTensor for embedding weight
c. Create a ShardedTensor for optimizer weight
--> There are three cases for optimizers
--> Single Optimizer Value per Shard
--> Row-wise Optimizer value per Shard
--> Point-wise Optimizer value per Shard

and then initialize the optimizer class with the appropriate parameters

Differential Revision: D78131693

facebook-github-bot · 2025-08-01T00:22:47Z

This pull request was exported from Phabricator. Differential Revision: D78131693

…ytorch#3248) Summary: **Context:** 1. We Introduced KeyValueEmbeddingFusedOptimizer for SSD Optimizer offloading https://www.internalfb.com/code/fbsource/[a43d796a4169]/fbcode/torchrec/distributed/batched_embedding_kernel.py?lines=341-376 But, currently the optimizer weights for SSD use-cases are not offloaded and are still on HBM Refer optimizer state dict CP: https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/minimal_viable_ai/core/model_family_api/optimizer.py?lines=1019-1028 Due to this, we want to initialize the optimizer class for SSD that allows us to get the latest optimizer weights values during checkpointing (get_optimizer_state call): https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py?lines=2540-2545 **Hence, In this Diff:** We have made the following changes: 1. Loop through every embedding table a. Change the table placement to CPU b. Create a ShardedTensor for embedding weight c. Create a ShardedTensor for optimizer weight --> There are three cases for optimizers --> Single Optimizer Value per Shard --> Row-wise Optimizer value per Shard --> Point-wise Optimizer value per Shard and then initialize the optimizer class with the appropriate parameters Differential Revision: D78131693

…ytorch#3248) Summary: Pull Request resolved: pytorch#3248 **Context:** 1. We Introduced KeyValueEmbeddingFusedOptimizer for SSD Optimizer offloading https://www.internalfb.com/code/fbsource/[a43d796a4169]/fbcode/torchrec/distributed/batched_embedding_kernel.py?lines=341-376 But, currently the optimizer weights for SSD use-cases are not offloaded and are still on HBM Refer optimizer state dict CP: https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/minimal_viable_ai/core/model_family_api/optimizer.py?lines=1019-1028 Due to this, we want to initialize the optimizer class for SSD that allows us to get the latest optimizer weights values during checkpointing (get_optimizer_state call): https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py?lines=2540-2545 **Hence, In this Diff:** We have made the following changes: 1. Loop through every embedding table a. Change the table placement to CPU b. Create a ShardedTensor for embedding weight c. Create a ShardedTensor for optimizer weight --> There are three cases for optimizers --> Single Optimizer Value per Shard --> Row-wise Optimizer value per Shard --> Point-wise Optimizer value per Shard and then initialize the optimizer class with the appropriate parameters Differential Revision: D78131693

facebook-github-bot · 2025-08-04T20:54:55Z

This pull request was exported from Phabricator. Differential Revision: D78131693

…ytorch#3248) Summary: **Context:** 1. We Introduced KeyValueEmbeddingFusedOptimizer for SSD Optimizer offloading https://www.internalfb.com/code/fbsource/[a43d796a4169]/fbcode/torchrec/distributed/batched_embedding_kernel.py?lines=341-376 But, currently the optimizer weights for SSD use-cases are not offloaded and are still on HBM Refer optimizer state dict CP: https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/minimal_viable_ai/core/model_family_api/optimizer.py?lines=1019-1028 Due to this, we want to initialize the optimizer class for SSD that allows us to get the latest optimizer weights values during checkpointing (get_optimizer_state call): https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py?lines=2540-2545 **Hence, In this Diff:** We have made the following changes: 1. Loop through every embedding table a. Change the table placement to CPU b. Create a ShardedTensor for embedding weight c. Create a ShardedTensor for optimizer weight --> There are three cases for optimizers --> Single Optimizer Value per Shard --> Row-wise Optimizer value per Shard --> Point-wise Optimizer value per Shard and then initialize the optimizer class with the appropriate parameters Differential Revision: D78131693

…ytorch#3248) Summary: Pull Request resolved: pytorch#3248 **Context:** 1. We Introduced KeyValueEmbeddingFusedOptimizer for SSD Optimizer offloading https://www.internalfb.com/code/fbsource/[a43d796a4169]/fbcode/torchrec/distributed/batched_embedding_kernel.py?lines=341-376 But, currently the optimizer weights for SSD use-cases are not offloaded and are still on HBM Refer optimizer state dict CP: https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/minimal_viable_ai/core/model_family_api/optimizer.py?lines=1019-1028 Due to this, we want to initialize the optimizer class for SSD that allows us to get the latest optimizer weights values during checkpointing (get_optimizer_state call): https://www.internalfb.com/code/fbsource/[6303aefbae20]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py?lines=2540-2545 **Hence, In this Diff:** We have made the following changes: 1. Loop through every embedding table a. Change the table placement to CPU b. Create a ShardedTensor for embedding weight c. Create a ShardedTensor for optimizer weight --> There are three cases for optimizers --> Single Optimizer Value per Shard --> Row-wise Optimizer value per Shard --> Point-wise Optimizer value per Shard and then initialize the optimizer class with the appropriate parameters Differential Revision: D78131693

facebook-github-bot · 2025-08-05T21:44:27Z

This pull request was exported from Phabricator. Differential Revision: D78131693

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 1, 2025

facebook-github-bot added the fb-exported label Aug 1, 2025

Raahul46 force-pushed the export-D78131693 branch from 67cbbc2 to 5736cb1 Compare August 4, 2025 20:51

Raahul46 force-pushed the export-D78131693 branch from 5736cb1 to cc4260b Compare August 4, 2025 20:54

Raahul46 force-pushed the export-D78131693 branch from cc4260b to 797fa10 Compare August 5, 2025 21:40

Raahul46 force-pushed the export-D78131693 branch from 797fa10 to e7480cb Compare August 5, 2025 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling Optimizer checkpointing for KeyValueEmbeddingFusedOptimizer #3248

Enabling Optimizer checkpointing for KeyValueEmbeddingFusedOptimizer #3248

Uh oh!

Raahul46 commented Aug 1, 2025

Uh oh!

facebook-github-bot commented Aug 1, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

Uh oh!

Enabling Optimizer checkpointing for KeyValueEmbeddingFusedOptimizer #3248

Are you sure you want to change the base?

Enabling Optimizer checkpointing for KeyValueEmbeddingFusedOptimizer #3248

Uh oh!

Conversation

Raahul46 commented Aug 1, 2025

Uh oh!

facebook-github-bot commented Aug 1, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

Uh oh!