Issues during LoRA Fine-Tuning: Got unexpected arguments: {'num_items_in_batch': 8192}

I am experimenting with LoRA to fine-tune a model to process and analyze PDF files so that I can ask questions based on the files. Essentially, I would upload PDFs, then the program would split it into chunks, and "learn" from the PDFs so I wouldn't have to repeatedly upload files and it would remember the context from the files (as I am building a streamlit application) and then, generate a vector store for querying. 

Here is my fine-tuning function: 
`def fine_tune_model_lora_with_suggestions(train_data):
    st.write("Starting high-performance fine-tuning with LoRA...")
    try:
        import torch
        from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
        from peft import LoraConfig, get_peft_model
        from datasets import Dataset

        # Define model name
        model_name = "bigscience/bloom-7b1"

        # Load tokenizer
        tokenizer = AutoTokenizer.from_pretrained(model_name)

        # Load model normally if CUDA is available, otherwise use CPU
        if torch.cuda.is_available():
            model = AutoModelForCausalLM.from_pretrained(
                model_name,
                load_in_8bit=True,  # Enable 8-bit quantization
                device_map="auto",  # Automatically map layers between GPU and CPU
                llm_int8_enable_fp32_cpu_offload=True,  # Offload some layers to CPU in FP32
                torch_dtype=torch.float16,  # Use FP16 for GPU-loaded layers
            )
        else:
            model = AutoModelForCausalLM.from_pretrained(model_name)

        # Apply LoRA configuration
        lora_config = LoraConfig(
            r=16,
            lora_alpha=32,
            lora_dropout=0.05,
            bias="none",
            task_type="CAUSAL_LM",
            target_modules=["query_key_value"],  # Specify target modules for LoRA
        )
        model = get_peft_model(model, lora_config)

        # Prepare dataset
        dataset = Dataset.from_list(train_data)

        def tokenize_function(examples):
            tokens = tokenizer(
                examples["text"],
                padding="max_length",
                truncation=True,
                max_length=512,
            )
            tokens["labels"] = tokens["input_ids"].copy()
            return tokens

        tokenized_dataset = dataset.map(tokenize_function, batched=True)

        # Training arguments
        training_args = TrainingArguments(
            per_device_train_batch_size=4,
            gradient_accumulation_steps=4,
            max_steps=200,
            learning_rate=2e-4,
            fp16=torch.cuda.is_available(),  # Enable FP16 only if CUDA is available
            logging_steps=10,
            output_dir="./outputs",
            save_steps=10,
            save_total_limit=2,
            report_to="none",
        )

        # Initialize Trainer
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=tokenized_dataset,
        )

        # Train model
        trainer.train()

        # Save fine-tuned model
        model.save_pretrained("./fine_tuned_bloom_lora")
        tokenizer.save_pretrained("./fine_tuned_bloom_lora")
        st.write("Fine-tuning completed successfully.")

    except ImportError as e:
        st.error(f"Import Error: {e}")
    except Exception as e:
        st.error(f"Error during LoRA fine-tuning: {e}")`

Just as a side note, I am running this code in Google Colabs. 

When I run my code in its entirety, I get this error: Error during LoRA fine-tuning: Got unexpected arguments: {'num_items_in_batch': 8192}. 

I would appreciate any help I could get on this! Thank you! 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues during LoRA Fine-Tuning: Got unexpected arguments: {'num_items_in_batch': 8192} #300

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues during LoRA Fine-Tuning: Got unexpected arguments: {'num_items_in_batch': 8192} #300

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions