|
| 1 | +# 🧠 MinionS Protocol - Cost-Efficient Local-Remote LLM Collaboration |
| 2 | + |
| 3 | +This example demonstrates the **MinionS protocol**, a groundbreaking approach for cost-efficient collaboration between small on-device models and large cloud models. Based on research from Stanford's Hazy Research lab, MinionS achieves **5.7× cost reduction** while maintaining **97.9% of cloud model performance**. |
| 4 | + |
| 5 | +> [!Tip] |
| 6 | +> ✨ **Real Cost Savings**: In practice, tasks that consume ~30,000 tokens with remote-only processing use only ~7,500-15,000 tokens with MinionS - that's **50-75% cost reduction**! |
| 7 | +
|
| 8 | +<p> |
| 9 | + <img src="https://github.com/HazyResearch/minions/raw/main/assets/Ollama_minionS_background.png" |
| 10 | + alt="MinionS Protocol Overview" |
| 11 | + width="600" |
| 12 | + style="border: 1px solid #ccc; border-radius: 8px;" /> |
| 13 | +</p> |
| 14 | + |
| 15 | +## 🚀 Getting Started |
| 16 | + |
| 17 | +### Requirements |
| 18 | + |
| 19 | ++ **[Docker Desktop] 4.43.0+ or [Docker Engine]** installed. |
| 20 | ++ **A laptop or workstation with a GPU** (e.g., a MacBook) for running open models locally. If you don't have a GPU, you can alternatively use **[Docker Offload]**. |
| 21 | ++ If you're using [Docker Engine] on Linux or [Docker Desktop] on Windows, ensure that the [Docker Model Runner requirements] are met (specifically that GPU support is enabled) and the necessary drivers are installed. |
| 22 | ++ If you're using Docker Engine on Linux, ensure you have [Docker Compose] 2.38.1 or later installed. |
| 23 | ++ An [OpenAI API Key](https://platform.openai.com/api-keys) 🔑. |
| 24 | + |
| 25 | +### Quick Start |
| 26 | + |
| 27 | +1. **Clone the official MinionS repository and navigate to the Docker setup:** |
| 28 | + |
| 29 | +```bash |
| 30 | +git clone https://github.com/HazyResearch/minions.git |
| 31 | +cd minions/apps/minions-docker |
| 32 | +``` |
| 33 | + |
| 34 | +2. **Set your OpenAI API key:** |
| 35 | + |
| 36 | +```bash |
| 37 | +export OPENAI_API_KEY=sk-your-key-here |
| 38 | +``` |
| 39 | + |
| 40 | +3. **Customize the model for better accuracy (recommended):** |
| 41 | + |
| 42 | +Edit the `docker-compose.minions.yml` file to use qwen3 instead of llama3.2: |
| 43 | + |
| 44 | +```yaml |
| 45 | +models: |
| 46 | + worker: |
| 47 | + model: ai/qwen3 # Changed from ai/llama3.2 for better accuracy (8B vs 3B params) |
| 48 | + context_size: 10000 |
| 49 | +``` |
| 50 | +
|
| 51 | +4. **Launch the MinionS protocol:** |
| 52 | +
|
| 53 | +```bash |
| 54 | +docker compose -f docker-compose.minions.yml up --build |
| 55 | +``` |
| 56 | + |
| 57 | +5. **Open your browser** and navigate to `http://localhost:8080` to access the interactive interface. |
| 58 | + |
| 59 | +## 🧠 What is the MinionS Protocol? |
| 60 | + |
| 61 | +The MinionS protocol enables **cost-efficient collaboration** between: |
| 62 | +- **Local Model** (on-device): Handles document reading, context processing, and initial analysis |
| 63 | +- **Remote Model** (cloud): Provides supervision, final reasoning, and quality assurance |
| 64 | + |
| 65 | +### Key Innovation: Decomposition Strategy |
| 66 | + |
| 67 | +Unlike simple chat protocols, MinionS uses a sophisticated **decompose-execute-aggregate** approach: |
| 68 | + |
| 69 | +1. **Decompose**: Remote model breaks complex tasks into simple, parallel subtasks |
| 70 | +2. **Execute**: Local model processes subtasks in parallel on document chunks |
| 71 | +3. **Aggregate**: Remote model synthesizes results and provides final answers |
| 72 | + |
| 73 | +## 📊 Cost Analysis & Performance |
| 74 | + |
| 75 | +### Academic Research Results |
| 76 | + |
| 77 | +Based on the [Stanford research paper](https://arxiv.org/pdf/2502.15964), MinionS demonstrates: |
| 78 | + |
| 79 | +| Protocol | Cost Reduction | Performance Recovery | Use Case | |
| 80 | +|----------|---------------|---------------------|----------| |
| 81 | +| **MinionS (8B local)** | **5.7× cheaper** | **97.9%** of remote performance | Production ready | |
| 82 | +| **MinionS (3B local)** | **6.0× cheaper** | **93.4%** of remote performance | Resource constrained | |
| 83 | +| Minion (simple chat) | 30.4× cheaper | 87.0% of remote performance | Basic tasks | |
| 84 | + |
| 85 | +### Real-World Token Usage |
| 86 | + |
| 87 | +**Research Paper Analysis Example:** |
| 88 | +- **Task**: "What are the three evaluation datasets used in the paper?" |
| 89 | +- **Remote-only**: ~30,064 tokens |
| 90 | +- **MinionS**: ~7,500-15,388 tokens |
| 91 | +- **Savings**: 50-75% token reduction |
| 92 | + |
| 93 | +## 🎯 Interactive Demo: Compare Remote vs MinionS |
| 94 | + |
| 95 | +The MinionS interface includes a **toggle feature** that lets you compare: |
| 96 | + |
| 97 | +### Remote-Only Mode |
| 98 | +- Processes entire document with cloud model |
| 99 | +- Higher token usage and cost |
| 100 | +- Baseline performance |
| 101 | + |
| 102 | +### MinionS Mode |
| 103 | +- Local model reads and processes document chunks |
| 104 | +- Remote model provides supervision and final answers |
| 105 | +- Dramatically reduced cloud costs |
| 106 | +- Maintained quality |
| 107 | + |
| 108 | +## 🎮 Step-by-Step Demo |
| 109 | + |
| 110 | +### Example: Research Paper Analysis |
| 111 | + |
| 112 | +1. **Start the system** following the Quick Start guide above |
| 113 | + |
| 114 | +2. **Load the MinionS research paper** as your document: |
| 115 | + - Download: https://arxiv.org/pdf/2502.15964 |
| 116 | + - Upload through the web interface |
| 117 | + |
| 118 | +3. **Ask the example question**: |
| 119 | + ``` |
| 120 | + Task: "What are the three evaluation datasets used in the paper?" |
| 121 | + Document Metadata: "Research Paper" |
| 122 | + ``` |
| 123 | + |
| 124 | +4. **Compare modes**: |
| 125 | + - **Remote-only**: Watch token usage (~30k tokens) |
| 126 | + - **MinionS**: See the dramatic reduction (~7.5-15k tokens) |
| 127 | + |
| 128 | +5. **Expected answer**: "The three evaluation datasets are FinanceBench, LongHealth, and QASPER" |
| 129 | + |
| 130 | +### Model Customization |
| 131 | + |
| 132 | +**Recommended**: Upgrade from llama3.2 (3B) to qwen3 (8B) for better accuracy: |
| 133 | + |
| 134 | +```yaml |
| 135 | +# In docker-compose.minions.yml |
| 136 | +models: |
| 137 | + worker: |
| 138 | + model: ai/qwen3 # 8B parameters - better accuracy |
| 139 | + # model: ai/llama3.2 # 3B parameters - faster download |
| 140 | + context_size: 10000 |
| 141 | +``` |
| 142 | +
|
| 143 | +**Trade-offs**: |
| 144 | +- **qwen3**: Slightly slower to download, significantly better accuracy |
| 145 | +- **llama3.2**: Faster to pull, adequate for simple tasks |
| 146 | +
|
| 147 | +## 🤝 When to Use MinionS |
| 148 | +
|
| 149 | +### ✅ Ideal Use Cases |
| 150 | +- **Document Analysis**: Financial reports, medical records, research papers |
| 151 | +- **Long Context Tasks**: Multi-page document processing |
| 152 | +- **Cost-Sensitive Applications**: High-volume document processing |
| 153 | +- **Privacy-Conscious**: Keep sensitive data local while leveraging cloud intelligence |
| 154 | +
|
| 155 | +## 🧹 Cleanup |
| 156 | +
|
| 157 | +To stop and remove containers: |
| 158 | +
|
| 159 | +```bash |
| 160 | +cd minions/apps/minions-docker |
| 161 | +docker compose -f docker-compose.minions.yml down -v |
| 162 | +``` |
| 163 | + |
| 164 | +## 📚 Additional Resources |
| 165 | + |
| 166 | +### Official MinionS Resources |
| 167 | +- **Research Paper**: [Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models](https://arxiv.org/pdf/2502.15964) |
| 168 | +- **GitHub Repository**: [HazyResearch/minions](https://github.com/HazyResearch/minions) |
| 169 | +- **Docker Setup**: [minions-docker](https://github.com/HazyResearch/minions/tree/main/apps/minions-docker) |
| 170 | + |
| 171 | +### Academic Citation |
| 172 | +```bibtex |
| 173 | +@article{narayan2025minions, |
| 174 | + title={Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models}, |
| 175 | + author={Narayan, Avanika and Biderman, Dan and Eyuboglu, Sabri and May, Avner and Linderman, Scott and Zou, James and R{\'e}, Christopher}, |
| 176 | + journal={arXiv preprint arXiv:2502.15964}, |
| 177 | + year={2025} |
| 178 | +} |
| 179 | +``` |
| 180 | + |
| 181 | +## 🏆 Key Benefits Summary |
| 182 | + |
| 183 | +- **💰 Cost Reduction**: 5.7× cheaper than remote-only processing |
| 184 | +- **🎯 High Accuracy**: Maintains 97.9% of cloud model performance |
| 185 | +- **🔧 Easy Customization**: Simple model swapping (llama3.2 → qwen3) |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## 📎 Credits |
| 190 | + |
| 191 | +- **Research**: [Stanford Hazy Research Lab](https://hazyresearch.stanford.edu/) |
| 192 | +- **Authors**: Avanika Narayan, Dan Biderman, Sabri Eyuboglu, and team |
| 193 | +- **Implementation**: [HazyResearch/minions](https://github.com/HazyResearch/minions) |
| 194 | +- **Docker Integration**: Compose for Agents community |
| 195 | + |
| 196 | +[Docker Compose]: https://github.com/docker/compose |
| 197 | +[Docker Desktop]: https://www.docker.com/products/docker-desktop/ |
| 198 | +[Docker Engine]: https://docs.docker.com/engine/ |
| 199 | +[Docker Model Runner requirements]: https://docs.docker.com/ai/model-runner/ |
| 200 | +[Docker Offload]: https://www.docker.com/products/docker-offload/ |
0 commit comments