Skip to content

Conversation

weifengpy
Copy link
Contributor

llama3: fully_shard on transformerblock and feed_forward. AC on transformerblock. backward prefetching order is correct and memory behavior is correct
NGPU=4 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh && mv outputs outputs-llama-ac

deepseek: fully_shard on transformerblock and moe.experts. AC on transformerblock. backward prefetching order is wrong and memory is wrong
NGPU=4 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh

need to figure out what's special about deepseek

wwwjn and others added 3 commits September 12, 2025 14:12
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025
@weifengpy weifengpy marked this pull request as draft September 18, 2025 23:07
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants