Skip to content

Document the HF_DATASETS_CACHE environment variable in the datasets cache documentation #7532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Harry-Yang0518
Copy link

This pull request updates the Datasets documentation to include the HF_DATASETS_CACHE environment variable. While the current documentation only mentions HF_HOME for overriding the default cache directory, HF_DATASETS_CACHE is also a supported and useful option for specifying a custom cache location for datasets stored in Arrow format.

This addition is based on the discussion in (#7457), where users noted the absence of this variable in the documentation despite its functionality. The update adds a new section to cache.mdx that explains how to use HF_DATASETS_CACHE with an example.

This change aims to improve clarity and help users better manage their cache directories when working in shared environments or with limited local storage.

Closes #7457.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document the HF_DATASETS_CACHE env variable
1 participant