09/26/2024 16:42:34 - ERROR - datasets.packaged_modules.json.json - Failed to read file '/data/zjy/databrew/try/llama2-lora-fine-tuning/data/alpaca_gpt4_data_zh.json' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Column() changed from object to array in row 0
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 122, in _generate_tables
[rank1]: pa_table = paj.read_json(
[rank1]: File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
[rank1]: File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
[rank1]: File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
[rank1]: pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0
[rank1]: During handling of the above exception, another exception occurred:
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 656, in <module>
[rank1]: train()
[rank1]: File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 359, in train
[rank1]: raw_datasets = _load_dataset(data_args, training_args, model_args)
[rank1]: File "/data/zjy/databrew/try/llama2-lora-fine-tuning/finetune-lora.py", line 284, in _load_dataset
[rank1]: raw_datasets = load_dataset(
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/load.py", line 1731, in load_dataset
[rank1]: builder_instance.download_and_prepare(
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 613, in download_and_prepare
[rank1]: self._download_and_prepare(
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 702, in _download_and_prepare
[rank1]: self._prepare_split(split_generator, **prepare_split_kwargs)
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/builder.py", line 1164, in _prepare_split
[rank1]: for key, table in logging.tqdm(
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/tqdm/std.py", line 1169, in __iter__
[rank1]: for obj in iterable:
[rank1]: File "/data/zjy/anaconda3/envs/bonito/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 150, in _generate_tables
[rank1]: f"This JSON file contain the following fields: {str(list(dataset.keys()))}. "
[rank1]: AttributeError: 'list' object has no attribute 'keys'