You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to load an already downloaded dataset with num_proc=64, the speed is very high for the first 10-20 seconds acheiving 30-40K samples / s, and 100% utilization for all cores but it soon drops to <= 1000 with almost 0% utilization for most cores.
Steps to reproduce the bug
// download dataset with cli
!huggingface-cli download --repo-type dataset timm/imagenet-1k-wds --max-workers 32
from datasets import load_dataset
ds = load_dataset("timm/imagenet-1k-wds", num_proc=64)
Expected behavior
100% core utilization throughout.
Environment info
Azure A100-80GB, 16 cores VM
The text was updated successfully, but these errors were encountered:
Thank you for reverting quickly. I digged a bit, and realized my disk's IOPS is also limited - which is causing this. will check further and report if it's an issue of hf datasets' side or mine.
Describe the bug
When I try to load an already downloaded dataset with num_proc=64, the speed is very high for the first 10-20 seconds acheiving 30-40K samples / s, and 100% utilization for all cores but it soon drops to <= 1000 with almost 0% utilization for most cores.
Steps to reproduce the bug
Expected behavior
100% core utilization throughout.
Environment info
Azure A100-80GB, 16 cores VM
The text was updated successfully, but these errors were encountered: