Skip to content

Identical keywords in build_kwargs and config_kwargs lead to TypeError in load_dataset_builder() #4910

Open
@bablf

Description

@bablf

Describe the bug

In load_dataset_builder(), build_kwargs and config_kwargs can contain the same keywords leading to a TypeError("type object got multiple values for keyword argument "xyz").

I ran into this problem with the keyword: base_path. It might happen with other kwargs as well. I think a quickfix would be

builder_cls = import_main_class(dataset_module.module_path)
builder_kwargs = dataset_module.builder_kwargs
data_files = builder_kwargs.pop("data_files", data_files)
config_name = builder_kwargs.pop("config_name", name)
hash = builder_kwargs.pop("hash")
base_path = builder_kwargs.pop("base_path")

and then pass base_path into builder_cls.

Steps to reproduce the bug

from datasets import load_dataset
load_dataset("rotten_tomatoes", base_path="./sample_data")

Expected results

The docs state: **config_kwargs — Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.

So I would expect to be able to pass the base_path into load_dataset().

Actual results

TypeError("type object got multiple values for keyword argument "base_path").

Environment info

  • datasets version: 2.4.0
  • Platform: macOS-12.5-arm64-arm-64bit
  • Python version: 3.8.9
  • PyArrow version: 9.0.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions