Test qwen model #361

jialei777 · 2025-08-12T19:03:48Z

Worked locally with 4 chips fsdp=4: export PJRT_DEVICE=TPU; export TORCHPRIME_TPU_TYPE=v6e-4 && python torchprime/torch_xla_models/train.py model=flex-qwen-1b

MFU: 0.21

On a v5p-128 cluster with command tp run --name jialei-0812-qwen-fsdp32tensor2 torchprime/torch_xla_models/train.py model=flex-qwen-1b task.global_batch_size=64 ici_mesh.fsdp=32 ici_mesh.tensor=2

fsdp64: hang????
fsdp 32 tp2: finished MFU 0.22
fsdp 16 tp4: finished: MFU 0.19
fsdp 8 tp8: finished, MFU 0.11

jialei777 added 2 commits August 12, 2025 19:03

ini

ce9f498

format

5b4867b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test qwen model #361

Test qwen model #361

Uh oh!

jialei777 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Test qwen model #361

Are you sure you want to change the base?

Test qwen model #361

Uh oh!

Conversation

jialei777 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jialei777 commented Aug 12, 2025 •

edited

Loading