Better HF integration for `MambaLMHeadModel` #471

Wauplin · 2024-07-16T08:45:16Z

Hi @tridao and team,

This is a follow-up PR after #469 and in particular #469 (comment). This PR:

adds huggingface_hub as an explicit dependency. It is already a dependency since mamba_ssm depends on transformers which depends on huggingface_hub but it's better to be explicit. I pinned a recent version that contains all recent PyTochModelHubMixin updates and fixes.
removes PyTorchModelHubMixin from Mamba2 layer (introduced in Add HF integration, better discoverability #469)
adds PyTorchModelHubMixin to MambaLMHeadModel for a better HF integration. I removed the existing from_pretrained / save_pretrained that were previously implemented. They still exists thanks to the mixin, and in a more robust way. The mixin also adds a push_to_hub method to directly save a model and push it to the Hub. All three helpers supports safetensors (if installed on the users's machine) and parameters like cache_dir/token/revision/etc. that can prove useful to users.

from mamba_ssm import MambaLMHeadModel

model = MambaLMHeadModel.from_pretrained("state-spaces/mamba2-130m")

(...)

model.push_to_hub("my-finetuned-mamba")

When doing MambaLMHeadModel(...).push_to_hub("username/my-cool-mamba"), a model card will be automatically created with some metadata in it (see docs). This improves a lot the UX on the Hub: better discoverability, better documentation, etc. In particular, I have added:

library_name: mamba-ssm. I've opened a PR on our side to make mamba-ssm recognized as a library by the HF Hub (see Add MambaSSM as a library huggingface/huggingface.js#802). Users landing on a mamba_ssm model will automatically get a code snippet on how to instantiate the model + link to the mamba repo + download count enabled
https://github.com/state-spaces/mamba as "repo_url" => all model cards will have a sentence "this model have been pushed using https://github.com/state-spaces/mamba" => better for documentation
arXiv:2312.00752 and arXiv:2405.21060 as tags so that your papers will be automatically linked to all Mamba models uploaded on the Hub => better for referencing
pipeline_tag: text-generation => models will appear when users filter models by task on the hub (https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)

In parallel of this PR, it would be good to update existing models to add metadata as well. I opened 3 PRs to showcase what should be updated:

I have a script to open such a PR on other models. Let me know what you think and if you validate, I'll proceed with the others.

mamba_ssm/modules/mamba2.py

mamba_ssm/modules/mamba_simple.py

Wauplin · 2024-07-16T14:49:08Z

@tridao @NielsRogge I've update this PR + the description following your instructions. It is now ready to be reviewed. Only MambaLMHeadModel inherits from PyTorchModelHubMixin in the end, making it much simpler.

NielsRogge

Thanks, looks like a nice clean-up. Have you tested pushing and pulling a Mamba model to/from the hub (to ensure it works fine)?

Wauplin · 2024-07-16T14:56:50Z

Yes I did:

Colab: https://colab.research.google.com/drive/1jDFxLmq0uWcQQrY2vGroTHbAIkkCfVYz?usp=sharing
Pushed model: https://huggingface.co/Wauplin/my-cool-mamba

@osanseviero

This PR adds https://github.com/state-spaces/mamba as an official library on the Hub. Let's wait for Mamba integration to be merged (coming soon) + having a few first models uploaded on the Hub. cc @osanseviero @NielsRogge @tridao Related PR on Mamba side: state-spaces/mamba#471

Wauplin added 2 commits July 16, 2024 10:16

Support HF integration in Mamba and Mamba2Simple + add metadata

404df4d

Add huggingface_hub as core dependency

44458c4

Wauplin mentioned this pull request Jul 16, 2024

Add MambaSSM as a library huggingface/huggingface.js#802

Merged

NielsRogge reviewed Jul 16, 2024

View reviewed changes

mamba_ssm/modules/mamba2.py Outdated Show resolved Hide resolved

NielsRogge reviewed Jul 16, 2024

View reviewed changes

mamba_ssm/modules/mamba_simple.py Outdated Show resolved Hide resolved

NielsRogge mentioned this pull request Jul 16, 2024

Add HF integration, better discoverability #469

Merged

Wauplin mentioned this pull request Jul 16, 2024

Hot-fix: do not share tags between ModelHubMixin siblings huggingface/huggingface_hub#2394

Merged

Wauplin added 2 commits July 16, 2024 13:37

mamba-ssm as library name

961eccb

requires 0.23.5

4bd4af9

Wauplin marked this pull request as draft July 16, 2024 13:05

Wauplin added 4 commits July 16, 2024 15:05

Mixin in MambaLMHeadModel only

a157ec5

last line

bc402de

remove mixin from mamba2

0c4686f

add pipeline_tag

e536a97

Wauplin changed the title ~~Support HF integration in Mamba and Mamba2Simple + add metadata~~ Better HF integration for MambaLMHeadModel Jul 16, 2024

Wauplin marked this pull request as ready for review July 16, 2024 14:41

Wauplin requested a review from NielsRogge July 16, 2024 14:49

NielsRogge approved these changes Jul 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better HF integration for `MambaLMHeadModel` #471

Better HF integration for `MambaLMHeadModel` #471

Uh oh!

Wauplin commented Jul 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Wauplin commented Jul 16, 2024

Uh oh!

NielsRogge left a comment

Uh oh!

Wauplin commented Jul 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Better HF integration for MambaLMHeadModel #471

Are you sure you want to change the base?

Better HF integration for MambaLMHeadModel #471

Uh oh!

Conversation

Wauplin commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wauplin commented Jul 16, 2024

Uh oh!

NielsRogge left a comment

Choose a reason for hiding this comment

Uh oh!

Wauplin commented Jul 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Better HF integration for `MambaLMHeadModel` #471

Better HF integration for `MambaLMHeadModel` #471

Wauplin commented Jul 16, 2024 •

edited

Loading