Skip to content

No way to run the public segments of a semi-private benchmark like vidore-v3 using the CLI #3562

@StupidBuluchacha

Description

@StupidBuluchacha

Describe the bug

Hi mteb team,

I was evaluating the vidore-v3 benchmark with 8 public and 2 private sets. The evaluation was stopped because it cannot access the private sets. I am not sure what caused this issue.

Below is my error:

Traceback (most recent call last):
  File "/mnt/workspace/miniconda3/envs/mieb/bin/mteb", line 7, in <module>
    sys.exit(main())
  File "/mnt/workspace/mteb/mteb/cli/build_cli.py", line 392, in main
    args.func(args)
  File "/mnt/workspace/mteb/mteb/cli/build_cli.py", line 84, in run
    mteb.evaluate(
  File "/mnt/workspace/mteb/mteb/evaluate.py", line 363, in evaluate
    _res = evaluate(
  File "/mnt/workspace/mteb/mteb/evaluate.py", line 453, in evaluate
    result = _evaluate_task(
  File "/mnt/workspace/mteb/mteb/evaluate.py", line 162, in _evaluate_task
    task.load_data()
  File "/mnt/workspace/mteb/mteb/abstasks/retrieval.py", line 273, in load_data
    _process_data(split, lang)
  File "/mnt/workspace/mteb/mteb/abstasks/retrieval.py", line 262, in _process_data
    self.dataset[hf_subset][split] = RetrievalDatasetLoader(
  File "/mnt/workspace/mteb/mteb/abstasks/retrieval_dataset_loaders.py", line 74, in __init__
    self.dataset_configs = get_dataset_config_names(self.hf_repo, self.revision)
  File "/mnt/workspace/miniconda3/envs/mieb/lib/python3.10/site-packages/datasets/inspect.py", line 161, in get_dataset_config_names
    dataset_module = dataset_module_factory(
  File "/mnt/workspace/miniconda3/envs/mieb/lib/python3.10/site-packages/datasets/load.py", line 1030, in dataset_module_factory
    raise e1 from None
  File "/mnt/workspace/miniconda3/envs/mieb/lib/python3.10/site-packages/datasets/load.py", line 985, in dataset_module_factory
    raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'mteb-private/Vidore3TelecomRetrieval' doesn't exist on the Hub or cannot be accessed.

To reproduce

I ran the following command:
mteb run -b "ViDoRe(v3)" -m "jinaai/jina-embeddings-v4"

Additional information

No response

Are you interested to contribute a fix for this bug?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions