Open
Description
Describe the bug
In load_dataset_builder()
, build_kwargs
and config_kwargs
can contain the same keywords leading to a TypeError("type object got multiple values for keyword argument "xyz").
I ran into this problem with the keyword: base_path
. It might happen with other kwargs as well. I think a quickfix would be
builder_cls = import_main_class(dataset_module.module_path)
builder_kwargs = dataset_module.builder_kwargs
data_files = builder_kwargs.pop("data_files", data_files)
config_name = builder_kwargs.pop("config_name", name)
hash = builder_kwargs.pop("hash")
base_path = builder_kwargs.pop("base_path")
and then pass base_path into builder_cls
.
Steps to reproduce the bug
from datasets import load_dataset
load_dataset("rotten_tomatoes", base_path="./sample_data")
Expected results
The docs state: **config_kwargs
— Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.
So I would expect to be able to pass the base_path into load_dataset()
.
Actual results
TypeError("type object got multiple values for keyword argument "base_path").
Environment info
datasets
version: 2.4.0- Platform: macOS-12.5-arm64-arm-64bit
- Python version: 3.8.9
- PyArrow version: 9.0.0