`load_dataset` defaults to json file format for datasets with 1 shard

### Describe the bug

I currently have multiple datasets (train+validation) saved as 50MB shards. For one dataset the validation pair is small enough to fit into a single shard and this apparently causes problems when loading the dataset. I created the datasets using a DatasetDict, saved them as 50MB arrow files for streaming and then load each dataset. I have no problem loading any of the other datasets with more than 1 arrow file/shard. 

The error indicates the training set got loaded in arrow format (correct) and the validation set in json (incorrect). This seems to be because some of the metadata files are considered as dataset files. 
```
Error loading /nfs/dataset_pt-uk: Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): ('arrow', {}), NamedSplit('validation'): ('json', {})} 
```

![Image](https://github.com/user-attachments/assets/f6e7596a-dd53-46a9-9a23-4e9cac2ac049)

Concretely, there is a mismatch between the metadata created by the `DatasetDict.save_to_file` and the builder for `datasets.load_dataset`:

https://github.com/huggingface/datasets/blob/e71b0b19d79c7531f9b9bea7c09916b5f6157f42/src/datasets/data_files.py#L107

The `folder_based_builder` lists all files and with 1 arrow file the json files (that are actually metadata) are in the majority.
https://github.com/huggingface/datasets/blob/e71b0b19d79c7531f9b9bea7c09916b5f6157f42/src/datasets/packaged_modules/folder_based_builder/folder_based_builder.py#L58

### Steps to reproduce the bug

Create a dataset with metadata and 1 arrow file in validation and multiple arrow files in the training set, following the above description. In my case, I saved the files via:

```python
        dataset = DatasetDict({
            'train': train_dataset,
            'validation': val_dataset
        })
        
        dataset.save_to_disk(output_path, max_shard_size="50MB")
```

### Expected behavior

The dataset would get loaded.

### Environment info

- `datasets` version: 3.6.0
- Platform: Linux-6.14.0-22-generic-x86_64-with-glibc2.41
- Python version: 3.12.7
- `huggingface_hub` version: 0.31.1
- PyArrow version: 18.1.0
- Pandas version: 2.2.3
- `fsspec` version: 2024.6.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`load_dataset` defaults to json file format for datasets with 1 shard #7650

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

load_dataset defaults to json file format for datasets with 1 shard #7650

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`load_dataset` defaults to json file format for datasets with 1 shard #7650