Skip to content

Conversation

@angeloskath
Copy link
Member

Mostly putting this to check if people think it's useful. We could also put it in mlx_lm.manage.

This PR introduces a utility called mlx_lm.share that can be used to broadcast models (or any folder/file) from 1 node to the others using mlx.core.distributed. It is often the case that we download a model on one node and then have to move it to others via external storage or something similar.

Thunderbolt 5 with JACCL actually saturates the write bandwidth of the disk so with mlx_lm.share I am getting 5-6 GB/s broadcast so no need for external storage to move things around. It would be strictly slower...

Examples

# Assumes we have ran
# mlx.distributed_config --hosts h1,h2,h3,h4,... --over thunderbolt \
#                        --backend jaccl (or jaccl-ring) --output hosts.json

mlx_lm.share --path mlx-community/Kimi-K2.5 --hostfile hosts.json

mlx_lm.share --path my-big-file.bin --hostfile hosts.json --dst /tmp/dst.bin

Comment on lines +142 to +144
parser.add_argument(
"--path", type=str, required=True, help="Path to the MLX model."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wdyt about renaming that to --model to be consistent with other command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although if the intention is this can be used for anything (not just a model) than --path is probably better.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This seems quite useful!

@ivanfioravanti
Copy link
Contributor

Nice! No more manual copy and paste across nodes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants