Releases: huggingface/huggingface_hub
v0.0.18: Repo metadata, git tags, Keras mixin
v0.0.18: Repo metadata, git tags, Keras mixin
Repository metadata (@julien-c)
The version v0.0.18 of the huggingface_hub
includes tools to manage repository metadata. The following example reads metadata from a repository:
from huggingface_hub import Repository
repo = Repository("xxx", clone_from="yyy")
data = repo.repocard_metadata_load()
The following example completes that metadata before writing it to the repository locally.
data["license"] = "apache-2.0"
repo.repocard_metadata_save(data)
Git tags (@AngledLuffa)
Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository
utility.
- Tags #323 (@AngledLuffa)
Revisited Keras support (@nateraw)
The Keras mixin has been revisited:
- It now saves models as
SavedModel
objects rather than.h5
files. - It now offers methods that can be leveraged simply as a functional API, instead of having to use the Mixin as an actual mixin.
Improvements and bug fixes
v0.0.17: Non-blocking git push, notebook login
v0.0.17: Non-blocking git push, notebook login
Non-blocking git-push
The pushing methods now have access to a blocking
boolean parameter to indicate whether the push should happen
asynchronously.
In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue
property on the Repository
object.
For example:
from huggingface_hub import Repository
repo = Repository("<local_folder>", clone_from="<user>/<model_name>")
with repo.commit("Commit message", blocking=False):
# Save data
last_command = repo.command_queue[-1]
# Status of the push command
last_command.status
# Will return the status code
# -> -1 will indicate the push is still ongoing
# -> 0 will indicate the push has completed successfully
# -> non-zero code indicates the error code if there was an error
# if there was an error, the stderr may be inspected
last_command.stderr
# Whether the command finished or if it is still ongoing
last_command.is_done
# Whether the command errored-out.
last_command.failed
When using blocking=False
, the commands will be tracked and your script will exit only when all pushes are done, even
if other errors happen in your script (a failed push counts as done).
- Non blocking git push #315 (@LysandreJik)
Notebook login (@sgugger)
The huggingface_hub
library now has a notebook_login
method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:
from huggingface_hub import notebook_login
notebook_login()
Improvements and bugfixes
- added option to create private repo #319 (@philschmid)
- display git push warnings #326 (@elishowk)
- Allow specifying data with the Inference API wrapper #271 (@osanseviero)
- Add auth to snapshot download #340 (@lewtun)
v0.0.16: Progress bars, git credentials
v0.0.16: Progress bars, git credentials
The huggingface_hub
version v0.0.16 introduces several quality of life improvements.
Progress bars in Repository
Progress bars are now visible with many git operations, such as pulling, cloning and pushing:
>>> from huggingface_hub import Repository
>>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")
Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory.
Download file pytorch_model.bin: 45%|████████████████████████████▋ | 144M/321M [00:13<00:12, 14.7MB/s]
Download file flax_model.msgpack: 42%|██████████████████████████▌ | 134M/319M [00:13<00:13, 14.4MB/s]
Branching support
There is now branching support in Repository
. This will clone the xxx
repository and checkout the new-branch
revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.
If the revision does not exist, it will create a branch from the latest commit on the main
branch.
>>> from huggingface_hub import Repository
>>> repo = Repository("local", clone_from="xxx", revision="new-branch")
Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout
method. If the revision already exists:
>>> repo.git_checkout("main")
If a branch should be created from the current head in the case that it does not exist:
>>> repo.git_checkout("brand-new-branch", create_branch_ok=True)
Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`
Finally, the commit
context manager has a new branch
parameter to specify to which branch the utility should push:
>>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"):
... # Save any file or model here, it will be committed to that branch.
... torch.save(model.state_dict())
Git credentials
The login system has been redesigned to leverage git-credential
instead of a token-based authentication system. It leverages the git-credential store
helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub
:
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
Username:
Password:
Login successful
Your token has been saved to /root/.huggingface/token
Authenticated through git-crendential store but this isn't the helper defined on your machine.
You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default
git config --global credential.helper store
Running the command git config --global credential.helper store
will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository
utility will have this helper set by default, so no action is required from your part when leveraging it.
Improved logging
The logging system is now similar to the existing logging system in transformers
and datasets
, based on a logging
module that controls the entire library's logging level:
>>> from huggingface_hub import logging
>>> logging.set_verbosity_error()
>>> logging.set_verbosity_info()
Bug fixes and improvements
- Add documentation to GitHub and the Hub docs about the Inference client wrapper #253 (@osanseviero)
- Have large files enabled by default when using
Repository
#219 (@LysandreJik) - Clarify/specify/document model card metadata,
model-index
, and pipeline/task types #265 (@julien-c) - [model_card][metadata] Actually, lets make dataset.name required #267 (@julien-c)
- Progress bars #261 (@LysandreJik)
- Add keras mixin #230 (@nateraw)
- Open source code related to the repo type (tag icon, display order, snippets) #273 (@osanseviero)
- Branch push to hub #276 (@LysandreJik)
- Git credentials #277 (@LysandreJik)
- Push to hub/commit with branches #282 (@LysandreJik)
- Better logging #262 (@LysandreJik)
- Remove custom language pack behavior #291 (@LysandreJik)
- Update Hub and huggingface_hub docs #293 (@osanseviero)
- Adding a handler #292 (@LysandreJik)
v0.0.15
v0.0.15: Documentation, bug fixes and misc improvements
Improvements and bugfixes
- [Docs] Update link to Gradio documentation #206 (@abidlabs)
- Fix title typo (Cliet -> Client) #207 (@cakiki)
- add _from_pretrained hook #159 (@nateraw)
- Add
filename
option tolfs_track
#212 (@LysandreJik) - Repository fixes #213 (@LysandreJik)
- Repository documentation #214 (@LysandreJik)
- Add datasets filtering and sorting #194 (@lhoestq)
- doc: sync github to spaces #221 (@borisdayma)
- added batch transform documentation & model archive documentation #224 (@philschmid)
- Sync with hf internal #228 (@mishig25)
- Adding batching support for superb #215 (@Narsil)
- Adding SD for superb (speech-classification). #225 (@Narsil)
- Use Hugging Face fork for s3prl #229 (@lewtun)
- Mv
interfaces
->widgets/lib/interfaces
#227 (@mishig25) - Tweak to prevent accidental sharing of token #226 (@julien-c)
- Fix CLI-based repo creation #234 (@osanseviero)
- Add proxify util function #235 (@mishig25)
v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation
v0.0.14: LFS Auto tracking, dataset_info
and list_datasets
, documentation
Datasets
Datasets repositories get better support, by first enabling full usage of the Repository
class for datasets repositories:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")
Datasets can now be retrieved from the Python runtime using the list_datasets
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
datasets = api.list_datasets()
len(datasets)
# 1048 publicly available dataset repositories at the time of writing
Information can be retrieved on specific datasets using the dataset_info
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# id: squad
# lastModified: 2021-07-07T13:18:53.595Z
# tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found',
# [...]
- Add dataset_info and list_datasets #164 (@lhoestq)
- Enable dataset repositories #151 (@LysandreJik)
Inference API wrapper client
Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests
anymore. See below for an example.
from huggingface_hub import InferenceApi
api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
# {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'},
# {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'},
# {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'},
# {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'},
# {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]
- Inference API wrapper client #65 (@osanseviero)
Auto-track with LFS
Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files
method:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!
It is automatically used when leveraging the commit
context manager:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
# add large files
# No push rejected error anymore!
- Auto track with LFS #177 (@LysandreJik)
Documentation
- Update docs structure #145 (@Pierrci)
- Update links to docs #147 (@LysandreJik)
- Add new repo guide #153 (@osanseviero)
- Add documentation for endpoints #155 (@osanseviero)
- Document hf.co webhook publicly #156 (@julien-c)
- docs: ✏️ mention the Training metrics tab #193 (@severo)
- doc for Spaces #189 (@julien-c)
Breaking changes
Reminder: the huggingface_hub
library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.
Two breaking changes are introduced with version v0.0.14.
The whoami
return changes from a tuple to a dictionary
- Allow obtaining Inference API tokens with whoami #157 (@osanseviero)
The whoami
method changes its returned value from a tuple of (<user>, [<organisations>])
to a dictionary containing a lot more information:
In versions v0.0.13 and below, here was the behavior of the whoami
method from the HfApi
class:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])
In version v0.0.14, this is updated to the following:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
# 'type': str,
# 'name': str,
# 'fullname': str,
# 'email': str,
# 'emailVerified': bool,
# 'apiToken': str,
# `plan': str,
# 'avatarUrl': str,
# 'orgs': List[str]
# }
The Repository
's use_auth_token
initialization parameter now defaults to True
.
The use_auth_token
initialization parameter of the Repository
class now defaults to True
. The behavior is unchanged if users are not logged in, at which point Repository
remains agnostic to the huggingface_hub
.
- Set use_auth_token to True by default #204 (@LysandreJik)
Improvements and bugfixes
- Add sklearn code snippet #133 (@osanseviero)
- Allow passing only model ID to clone when authenticated #150 (@LysandreJik)
- More robust endpoint with toggled staging endpoint #148 (@LysandreJik)
- Add config to list_models #152 (@osanseviero)
- Fix audio-to-audio widget and add icon #142 (@osanseviero)
- Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)
- docs: fix webhook response format #162 (@severo)
- Update link in README.md #163 (@nateraw)
- Revert "docs: fix webhook response format (#162)" #165 (@severo)
- Add Keras docker image #117 (@osanseviero)
- Allow multiple models when testing a pipeline #124 (@osanseviero)
- scikit rebased #170 (@Narsil)
- Upgrading community frameworks to
audio-to-audio
. #94 (@Narsil) - Add sagemaker docs #173 (@philschmid)
- Add Structured Data Classification as task #172 (@osanseviero)
- Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)
- Updating spacy. #179 (@Narsil)
- Create initial superb docker image structure #181 (@osanseviero)
- Upgrading asteroid image. #175 (@Narsil)
- Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)
- Fixing audio-to-audio validation. #184 (@Narsil)
rmdir api-inference-community/src/sentence-transformers
#188 (@Pierrci)- Allow generic inference for ASR for superb #185 (@osanseviero)
- Add timestamp to snapshot download tests #201 (@LysandreJik)
- No need for token to understand HF urls #203 (@LysandreJik)
- Remove
--no_renames
argument to list deleted files. #205 (@LysandreJik)
v0.0.13: Context Manager
v0.0.13: Context Manager
Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.
Example with a single file
from huggingface_hub import Repository
repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)
with repo.commit("My first file."):
with open("file.txt", "w+") as f:
f.write(json.dumps({"key": "value"}))
Example with a torch.save
statement:
import torch
from huggingface_hub import Repository
model = torch.nn.Transformer()
repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)
with repo.commit("Adding my cool model!"):
torch.save(model.state_dict(), "model.pt")
Example with a Flax/JAX seralization statement
from flax import serialization
from jax import random
from flax import linen as nn
from huggingface_hub import Repository
model = nn.Dense(features=5)
key1, key2 = random.split(random.PRNGKey(0))
x = random.normal(key1, (10,))
params = model.init(key2, x)
bytes_output = serialization.to_bytes(params)
repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)
with repo.commit("Adding my cool Flax model!"):
with open("flax_model.msgpack", "wb") as f:
f.write(bytes_output)
Patch release: Repository clones
Patches an issue when cloning a repository twice.
v0.0.11: Improved documentation, `hf_hub_download` and `Repository` power-up
v0.0.11: Improved documentation, hf_hub_download
and Repository
power-up
Improved documentation
The huggingface_hub
documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.
- New documentation for 🤗 Hub #71 (@osanseviero)
- Step by step guide on adding Model Hub support to libraries #86 (@LysandreJik)
New method: hf_hub_download
A new method is introduced: hf_hub_download
. It is the equivalent of doing cached_download(hf_hub_url())
, in a single method.
- HF Hub download #137 (@LysandreJik)
Repository
power-up
The Repository
class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.
The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub
method manage a repository as one would from the command line.
- Repository power-up #132 (@LysandreJik)
Improvement & Fixes
- Adding
audio-to-audio
task. #93 (@Narsil) - When pipelines fail to load in framework code, for whatever reason #96 (@Narsil)
- Solve
rmtree
issue on windows #105 (@SBrandeis) - Add identical_ok option to HfApi.upload_file method #102 (@SBrandeis)
- Solve compatibility issues when calling
subprocess.run
#104 (@SBrandeis) - Open source Inference widgets + optimize for community contributions #87 (@julien-c)
- model
tags
can beundefined
#107 (@Pierrci) - Doc tweaks #109 (@julien-c)
- [huggingface_hub] Support for spaces #108 (@julien-c)
- speechbrain library tag + code snippet #73 (@osanseviero)
- Allow batching for feature-extraction #106 (@osanseviero)
- adding audio-to-audio widget. #95 (@Narsil)
- Add image to text (for image captioning) #114 (@osanseviero)
- Add formatting and upgrade Sentence Transformers api version for better error messages #119 (@osanseviero)
- Change videos in docs so they are played directly in our site #120 (@osanseviero)
- Fix inference API GitHub actions #125 (@osanseviero)
- Fixing sentence-transformers CACHE value for docker + functools (docker needs Py3.8) #123 (@Narsil)
- Load errors with flair should now be generating proper API errors. #121 (@Narsil)
- Simplify manage to autodetect task+framework if possible. #122 (@Narsil)
- Change sentence transformers source to original repo #128 (@osanseviero)
- Allow Python versions with letters in the minor version suffix #82 (@ulf1)
- Update
upload_file
docs #136 (@LysandreJik) - Reformat repo README #130 (@osanseviero)
- Add config to model info #135 (@osanseviero)
- Add input validation for structured-data-classification #97 (@osanseviero)
v0.0.10: Merging `huggingface_hub` with `api-inference-community` and hub interfaces
v0.0.10: Merging huggingface_hub
with api-inference-community
and hub interfaces
v0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub
repository is now the central platform to contribute new libraries to be supported on the hub.
It regroups three previously separated components:
- The
huggingface_hub
Python library, as the Python library to download, upload, and retrieve information from the hub. - The
api-inference-community
, as the platform where libraries wishing for hub support may be added. - The
interfaces
, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.
Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub
Improvement & Fixes
- Add typing extensions to conda yaml file #49 (@LysandreJik)
- Alignment on modelcard metadata specification #39 (@LysandreJik)
- Bring interfaces from
widgets-server
#50 (@julien-c) - Sentence similarity default widget and pipeline type #52 (@osanseviero)
- [interfaces] Expose configuration options for external libraries #51 (@julien-c)
- Adding
api-inference-community
tohuggingface_hub
. #48 (@Narsil) - Add TensorFlowTTS as library + code snippet #55 (@osanseviero)
- Add protobuf as a dependency to handle tokenizers that require it: #58 (@Narsil)
- Update validation for NLP tasks #59 (@osanseviero)
- spaCy code snippet and language tag #57 (@osanseviero)
- SpaCy fixes #60 (@osanseviero)
- Allow changing repo visibility programmatically #61 (@osanseviero)
- Add Adapter Transformers snippet #62 (@osanseviero)
- Change order in spaCy snippet #66 (@osanseviero)
- Add validation to check all rows in table question answering have same length #67 (@osanseviero)
- added question-answering part for Bengali language #68 (@sagorbrur)
- Add spaCy to inference API #63 (@osanseviero)
- AllenNLP library tag + code snippet #72 (@osanseviero)
- Fix AllenNLP QA example #80 (@epwalsh)
- do not crash even if this config isn't set #81 (@julien-c)
- Mark model config as optional #83 (@Pierrci)
- Add repr() to ModelFile and RepoObj #75 (@lewtun)
- Refactor create_repo #84 (@SBrandeis)
v0.0.9: HTTP File uploads, multiple filter model selection
v0.0.9: HTTP file uploads, multiple filter model selection
Support for large file uploads
Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.
- [API] Support for the file upload endpoint #42 (@SBrandeis)
The HfApi.model_list
method now allows multiple filters
Models may now be filtered using several filters:
Example usage:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all models
>>> api.list_models()
>>> # List only the text classification models
>>> api.list_models(filter="text-classification")
>>> # List only the russian models compatible with pytorch
>>> api.list_models(filter=("ru", "pytorch"))
>>> # List only the models trained on the "common_voice" dataset
>>> api.list_models(filter="dataset:common_voice")
>>> # List only the models from the AllenNLP library
>>> api.list_models(filter="allennlp")
- Document the
filter
argument #41 (@LysandreJik)
ModelInfo
now has a readable representation
Improvement of the ModelInfo
class so that it displays information about the object.
- Include a readable repr for ModelInfo #32 (@muellerzr)
Improvements and bugfixes
- Fix conda by specifying python version + add tests to main branch #28 (@LysandreJik)
- Improve Mixin #34 (@LysandreJik)
- Enable
library_name
andlibrary_version
insnapshot_download
#38 (@LysandreJik) - [Windows support] Very long filenames #40 (@LysandreJik)
- Make error message more verbose when creating a repo #44 (@osanseviero)
- Open-source /docs #46 (@julien-c)