-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Better HF integration for MambaLMHeadModel
#471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MambaLMHeadModel
|
@tridao @NielsRogge I've update this PR + the description following your instructions. It is now ready to be reviewed. Only |
NielsRogge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks like a nice clean-up. Have you tested pushing and pulling a Mamba model to/from the hub (to ensure it works fine)?
|
Yes I did: |
This PR adds https://github.com/state-spaces/mamba as an official library on the Hub. Let's wait for Mamba integration to be merged (coming soon) + having a few first models uploaded on the Hub. cc @osanseviero @NielsRogge @tridao Related PR on Mamba side: state-spaces/mamba#471
Hi @tridao and team,
This is a follow-up PR after #469 and in particular #469 (comment). This PR:
huggingface_hubas an explicit dependency. It is already a dependency sincemamba_ssmdepends ontransformerswhich depends onhuggingface_hubbut it's better to be explicit. I pinned a recent version that contains all recentPyTochModelHubMixinupdates and fixes.PyTorchModelHubMixinfromMamba2layer (introduced in Add HF integration, better discoverability #469)PyTorchModelHubMixintoMambaLMHeadModelfor a better HF integration. I removed the existingfrom_pretrained/save_pretrainedthat were previously implemented. They still exists thanks to the mixin, and in a more robust way. The mixin also adds apush_to_hubmethod to directly save a model and push it to the Hub. All three helpers supports safetensors (if installed on the users's machine) and parameters likecache_dir/token/revision/etc. that can prove useful to users.When doing
MambaLMHeadModel(...).push_to_hub("username/my-cool-mamba"), a model card will be automatically created with some metadata in it (see docs). This improves a lot the UX on the Hub: better discoverability, better documentation, etc. In particular, I have added:library_name: mamba-ssm. I've opened a PR on our side to makemamba-ssmrecognized as a library by the HF Hub (see Add MambaSSM as a library huggingface/huggingface.js#802). Users landing on amamba_ssmmodel will automatically get a code snippet on how to instantiate the model + link to the mamba repo + download count enabledhttps://github.com/state-spaces/mambaas "repo_url" => all model cards will have a sentence "this model have been pushed using https://github.com/state-spaces/mamba" => better for documentationarXiv:2312.00752andarXiv:2405.21060as tags so that your papers will be automatically linked to all Mamba models uploaded on the Hub => better for referencingpipeline_tag: text-generation=> models will appear when users filter models by task on the hub (https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)In parallel of this PR, it would be good to update existing models to add metadata as well. I opened 3 PRs to showcase what should be updated:
I have a script to open such a PR on other models. Let me know what you think and if you validate, I'll proceed with the others.