Open
Description
Right now we consider all the files of a dataset to be the same data, e.g.
single_subset_dataset/
├── train0.jsonl
├── train1.jsonl
└── train2.jsonl
but in cases like this, each file is actually a different subset of the dataset and should be loaded separately
many_subsets_dataset/
├── animals.jsonl
├── trees.jsonl
└── metadata.jsonl
It would be nice to detect those subsets automatically using a simple heuristic. For example we can group files together if their paths names are the same except some digits ?
Metadata
Metadata
Assignees
Labels
No labels