[feat] LoRA

# 🎯 **Goal (What & Why)**  
Add LoRA (Low-Rank Adaptation) support to Fast-LLM for flexible and memory-efficient fine-tuning.

**Motivations:**

- **Generic Low-Compute Fine=tuning:** Enable standard LoRA use cases to reduce memory usage and improve fine-tuning accessibility.
- **Token-Switched LoRA (Phi-4):** Support the architecture used in Phi-4-Multimodal's token-switched LoRAs for modular multimodal capabilities, see https://huggingface.co/microsoft/Phi-4-multimodal-instruct/blob/main/phi_4_mm.tech_report.02252025.pdf
- **LoRA-Infused SSM-Transformer Hybrid Architecture (Zamba-2):** Provide compatibility with Zamba-2's architecture to enhance model extensibility, see https://arxiv.org/abs/2411.15242.
- **LoRA MoEs:** Integrate LoRA with Mixture-of-Experts (MoE) to support dynamic and efficient module switching, see @oleksost's paper https://arxiv.org/abs/2405.11157.
- **LoRA RegMix:** Rather than using smaller models, use small compute.

# 🚀 **Execution Plan**  

### **Step 1: What is the smallest working version?**

1. **Minimal Integration:** Add optional LoRA layers to `Wq` and `Wv` of each transformer layer in Fast-LLM.
2. **Configuration Design:** Implement a minimal `LoraConfig` similar to PEFT's [LoraConfig](https://github.com/huggingface/peft/blob/f51203f3e4d8b3856596d3296dae7180f15910c1/src/peft/tuners/lora/config.py#L200), focusing only on the essential parameters:
    * r (`int`): Lora attention dimension (the "rank").
    * lora_alpha (`int`): The alpha parameter for Lora scaling.
3. **MVP Approach:** Keep the implementation simple:
    * LoRA layers are functionally always present, but they are lazily initialized with zeros (no-op) and remain inactive when their learning rate is set to `0`.
    * When exporting models to HF, store LoRA weights separately so that they can directly be used with `PeftModel.from_pretrained`, see https://huggingface.co/docs/peft/en/tutorial/peft_model_config#peft-models.
### **Step 2: What additional optimizations are possible (later, out-of-scope for now)?**  
1. **Loading HF LoRA Models:** Convert LoRA weights from HF hub to Fast-LLM LoRA weights.
2. **Advanced Configurations:** Introduce more advanced LoRA configurations from PEFT's `LoreConfig`, e.g. to define which weights get LoRA adapters.
3. **Performance Optimization:** Improve speed and memory efficiency. We shouldn't over-invest here, because LoRA is fast and memory-efficient by design already.
4. **Support for Complex Architectures:** Extend LoRA to support token-switching (Phi-4) and MoEs, supplementing Fast-LLM's existing MoE approach.

# 📌 **Acceptance Criteria** (Must-Haves for Completion)

* LoRA layers must be **functional and tested** in Fast-LLM.
* The implementation must include **clear documentation** explaining the minimal viable setup and configurations.
* The PR must include a tutorial for LoRA based fine-tuning.
* The PR must provide a **performance/impact summary** demonstrating memory savings and fine-tuning flexibility.
* **No refactors unless directly necessary** for feature completion.

# 🛠️ **Project Management**  
- [x] **Assign the project to the Fast-LLM project.**  
- [x] **Set the `Estimate` field (in days) in the GitHub project.**  
- [x] **Use the `Size` field to categorize the PR size (Small/Medium/Large).**  
- [x] **Assign an owner when opening the issue.**  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] LoRA #149

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (later, out-of-scope for now)?

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feat] LoRA #149

Description

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (later, out-of-scope for now)?

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions