Skip to content

可以问问datasets的版本吗,我试了几个版本,都是datasets报错 #16

@Jianyi2004

Description

@Jianyi2004
09/26/2024 16:42:34 - ERROR - datasets.packaged_modules.json.json - Failed to read file '/data/zjy/databrew/try/llama2-lora-fine-tuning/data/alpaca_gpt4_data_zh.json' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Column() changed from object to array in row 0
[rank1]: Traceback (most recent call last):
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 122, in _generate_tables
[rank1]:     pa_table = paj.read_json(
[rank1]:   File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
[rank1]:   File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
[rank1]:   File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
[rank1]: pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0

[rank1]: During handling of the above exception, another exception occurred:

[rank1]: Traceback (most recent call last):
[rank1]:   File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 656, in <module>
[rank1]:     train()
[rank1]:   File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 359, in train
[rank1]:     raw_datasets = _load_dataset(data_args, training_args, model_args)
[rank1]:   File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 284, in _load_dataset
[rank1]:     raw_datasets = load_dataset(
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/load.py", line 1731, in load_dataset
[rank1]:     builder_instance.download_and_prepare(
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 613, in download_and_prepare
[rank1]:     self._download_and_prepare(
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 702, in _download_and_prepare
[rank1]:     self._prepare_split(split_generator, **prepare_split_kwargs)
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 1164, in _prepare_split
[rank1]:     for key, table in logging.tqdm(
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/tqdm/std.py", line 1169, in __iter__
[rank1]:     for obj in iterable:
[rank1]:   File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 150, in _generate_tables
[rank1]:     f"This JSON file contain the following fields: {str(list(dataset.keys()))}. "
[rank1]: AttributeError: 'list' object has no attribute 'keys'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions