One subset per file in repo ?

Right now we consider all the files of a dataset to be the same data, e.g.
```
single_subset_dataset/
├── train0.jsonl
├── train1.jsonl
└── train2.jsonl
```
but in cases like this, each file is actually a different subset of the dataset and should be loaded separately
```
many_subsets_dataset/
├── animals.jsonl
├── trees.jsonl
└── metadata.jsonl
```

It would be nice to detect those subsets automatically using a simple heuristic. For example we can group files together if their paths names are the same except some digits ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One subset per file in repo ? #7066

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

One subset per file in repo ? #7066

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions