Skip to content

Releases: huggingface/huggingface_hub

v0.0.18: Repo metadata, git tags, Keras mixin

04 Oct 21:10

Choose a tag to compare

v0.0.18: Repo metadata, git tags, Keras mixin

Repository metadata (@julien-c)

The version v0.0.18 of the huggingface_hub includes tools to manage repository metadata. The following example reads metadata from a repository:

from huggingface_hub import Repository

repo = Repository("xxx", clone_from="yyy")
data = repo.repocard_metadata_load()

The following example completes that metadata before writing it to the repository locally.

data["license"] = "apache-2.0"
repo.repocard_metadata_save(data)

Git tags (@AngledLuffa)

Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository utility.

Revisited Keras support (@nateraw)

The Keras mixin has been revisited:

  • It now saves models as SavedModel objects rather than .h5 files.
  • It now offers methods that can be leveraged simply as a functional API, instead of having to use the Mixin as an actual mixin.

Improvements and bug fixes

v0.0.17: Non-blocking git push, notebook login

04 Oct 21:00

Choose a tag to compare

v0.0.17: Non-blocking git push, notebook login

Non-blocking git-push

The pushing methods now have access to a blocking boolean parameter to indicate whether the push should happen
asynchronously.

In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue
property on the Repository object.

For example:

from huggingface_hub import Repository

repo = Repository("<local_folder>", clone_from="<user>/<model_name>")

with repo.commit("Commit message", blocking=False):
    # Save data

last_command = repo.command_queue[-1]

# Status of the push command
last_command.status  
# Will return the status code
#     -> -1 will indicate the push is still ongoing
#     -> 0 will indicate the push has completed successfully
#     -> non-zero code indicates the error code if there was an error

# if there was an error, the stderr may be inspected
last_command.stderr

# Whether the command finished or if it is still ongoing
last_command.is_done

# Whether the command errored-out.
last_command.failed

When using blocking=False, the commands will be tracked and your script will exit only when all pushes are done, even
if other errors happen in your script (a failed push counts as done).

Notebook login (@sgugger)

The huggingface_hub library now has a notebook_login method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:

from huggingface_hub import notebook_login

notebook_login()

Improvements and bugfixes

v0.0.16: Progress bars, git credentials

27 Aug 13:03

Choose a tag to compare

v0.0.16: Progress bars, git credentials

The huggingface_hub version v0.0.16 introduces several quality of life improvements.

Progress bars in Repository

Progress bars are now visible with many git operations, such as pulling, cloning and pushing:

>>> from huggingface_hub import Repository
>>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")
Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory.
Download file pytorch_model.bin:  45%|████████████████████████████▋                                   | 144M/321M [00:13<00:12, 14.7MB/s]
Download file flax_model.msgpack:  42%|██████████████████████████▌                                    | 134M/319M [00:13<00:13, 14.4MB/s]

Branching support

There is now branching support in Repository. This will clone the xxx repository and checkout the new-branch revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.

If the revision does not exist, it will create a branch from the latest commit on the main branch.

>>> from huggingface_hub import Repository
>>> repo = Repository("local", clone_from="xxx", revision="new-branch")

Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout method. If the revision already exists:

>>> repo.git_checkout("main")

If a branch should be created from the current head in the case that it does not exist:

>>> repo.git_checkout("brand-new-branch", create_branch_ok=True)
Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`

Finally, the commit context manager has a new branch parameter to specify to which branch the utility should push:

>>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"):
...     # Save any file or model here, it will be committed to that branch.
...     torch.save(model.state_dict())

Git credentials

The login system has been redesigned to leverage git-credential instead of a token-based authentication system. It leverages the git-credential store helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub:

        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        
Username: 
Password: 
Login successful
Your token has been saved to /root/.huggingface/token
Authenticated through git-crendential store but this isn't the helper defined on your machine.
You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default

git config --global credential.helper store

Running the command git config --global credential.helper store will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository utility will have this helper set by default, so no action is required from your part when leveraging it.

Improved logging

The logging system is now similar to the existing logging system in transformers and datasets, based on a logging module that controls the entire library's logging level:

>>> from huggingface_hub import logging
>>> logging.set_verbosity_error()
>>> logging.set_verbosity_info()

Bug fixes and improvements

v0.0.15

28 Jul 18:44

Choose a tag to compare

v0.0.15: Documentation, bug fixes and misc improvements

Improvements and bugfixes

v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation

18 Jul 08:06

Choose a tag to compare

v0.0.14: LFS Auto tracking, dataset_info and list_datasets, documentation

Datasets

Datasets repositories get better support, by first enabling full usage of the Repository class for datasets repositories:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")

Datasets can now be retrieved from the Python runtime using the list_datasets method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
datasets = api.list_datasets()

len(datasets)
# 1048 publicly available dataset repositories at the time of writing

Information can be retrieved on specific datasets using the dataset_info method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# 	id: squad
#	lastModified: 2021-07-07T13:18:53.595Z
#	tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found', 
# [...]

Inference API wrapper client

Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests anymore. See below for an example.

from huggingface_hub import InferenceApi

api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
#    {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'}, 
#    {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'}, 
#    {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'}, 
#    {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'}, 
#    {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]

Auto-track with LFS

Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files method:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")

# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!

It is automatically used when leveraging the commit context manager:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
    # add large files

# No push rejected error anymore!

Documentation

Breaking changes

Reminder: the huggingface_hub library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.

Two breaking changes are introduced with version v0.0.14.

The whoami return changes from a tuple to a dictionary

The whoami method changes its returned value from a tuple of (<user>, [<organisations>]) to a dictionary containing a lot more information:

In versions v0.0.13 and below, here was the behavior of the whoami method from the HfApi class:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])

In version v0.0.14, this is updated to the following:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
#     'type': str, 
#     'name': str, 
#     'fullname': str, 
#     'email': str,
#     'emailVerified': bool, 
#     'apiToken': str,
#     `plan': str, 
#     'avatarUrl': str,
#     'orgs': List[str]
# }

The Repository's use_auth_token initialization parameter now defaults to True.

The use_auth_token initialization parameter of the Repository class now defaults to True. The behavior is unchanged if users are not logged in, at which point Repository remains agnostic to the huggingface_hub.

Improvements and bugfixes

v0.0.13: Context Manager

28 Jun 12:57

Choose a tag to compare

v0.0.13: Context Manager

Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.

Example with a single file

from huggingface_hub import Repository

repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)

with repo.commit("My first file."):
    with open("file.txt", "w+") as f:
        f.write(json.dumps({"key": "value"}))

Example with a torch.save statement:

import torch
from huggingface_hub import Repository

model = torch.nn.Transformer()

repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)

with repo.commit("Adding my cool model!"):
    torch.save(model.state_dict(), "model.pt")

Example with a Flax/JAX seralization statement

from flax import serialization
from jax import random
from flax import linen as nn
from huggingface_hub import Repository

model = nn.Dense(features=5)

key1, key2 = random.split(random.PRNGKey(0))
x = random.normal(key1, (10,))
params = model.init(key2, x)

bytes_output = serialization.to_bytes(params)

repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)

with repo.commit("Adding my cool Flax model!"):
    with open("flax_model.msgpack", "wb") as f:
        f.write(bytes_output)

Patch release: Repository clones

23 Jun 16:24

Choose a tag to compare

Patches an issue when cloning a repository twice.

v0.0.11: Improved documentation, `hf_hub_download` and `Repository` power-up

23 Jun 11:01

Choose a tag to compare

v0.0.11: Improved documentation, hf_hub_download and Repository power-up

Improved documentation

The huggingface_hub documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.

New method: hf_hub_download

A new method is introduced: hf_hub_download. It is the equivalent of doing cached_download(hf_hub_url()), in a single method.

Repository power-up

The Repository class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.

The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub method manage a repository as one would from the command line.

Improvement & Fixes

v0.0.10: Merging `huggingface_hub` with `api-inference-community` and hub interfaces

08 Jun 13:43

Choose a tag to compare

v0.0.10: Merging huggingface_hub with api-inference-community and hub interfaces

v0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub repository is now the central platform to contribute new libraries to be supported on the hub.

It regroups three previously separated components:

  • The huggingface_hub Python library, as the Python library to download, upload, and retrieve information from the hub.
  • The api-inference-community, as the platform where libraries wishing for hub support may be added.
  • The interfaces, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.

Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub

Improvement & Fixes

v0.0.9: HTTP File uploads, multiple filter model selection

20 May 15:19

Choose a tag to compare

v0.0.9: HTTP file uploads, multiple filter model selection

Support for large file uploads

Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.

The HfApi.model_list method now allows multiple filters

Models may now be filtered using several filters:

                Example usage:

                    >>> from huggingface_hub import HfApi
                    >>> api = HfApi()

                    >>> # List all models
                    >>> api.list_models()

                    >>> # List only the text classification models
                    >>> api.list_models(filter="text-classification")

                    >>> # List only the russian models compatible with pytorch
                    >>> api.list_models(filter=("ru", "pytorch"))

                    >>> # List only the models trained on the "common_voice" dataset
                    >>> api.list_models(filter="dataset:common_voice")

                    >>> # List only the models from the AllenNLP library
                    >>> api.list_models(filter="allennlp")

ModelInfo now has a readable representation

Improvement of the ModelInfo class so that it displays information about the object.

Improvements and bugfixes