|
| 1 | +# 🧠 MinionS Protocol - Cost-Efficient Local-Remote LLM Collaboration |
| 2 | + |
| 3 | +This example demonstrates the **MinionS protocol**, a groundbreaking approach for cost-efficient collaboration between |
| 4 | +small on-device models and large cloud models. |
| 5 | +Based on research from Stanford's Hazy Research lab, MinionS achieves **5.7× cost reduction** |
| 6 | +while maintaining **97.9% of cloud model performance**. |
| 7 | + |
| 8 | +## 🚀 Getting Started |
| 9 | + |
| 10 | +### Requirements |
| 11 | + |
| 12 | ++ **[Docker Desktop] 4.43.0+ or [Docker Engine]** installed. |
| 13 | ++ **A laptop or workstation with a GPU** (e.g., a MacBook) for running open models locally. If you |
| 14 | + don't have a GPU, you can alternatively use **[Docker Offload]**. |
| 15 | ++ If you're using [Docker Engine] on Linux or [Docker Desktop] on Windows, ensure that the |
| 16 | + [Docker Model Runner requirements] are met (specifically that GPU |
| 17 | + support is enabled) and the necessary drivers are installed. |
| 18 | ++ If you're using Docker Engine on Linux, ensure you have [Docker Compose] 2.38.1 or later installed. |
| 19 | ++ An [OpenAI API Key](https://platform.openai.com/api-keys) 🔑. |
| 20 | + |
| 21 | +### Quick Start |
| 22 | + |
| 23 | +1. **Clone the official MinionS repository and navigate to the Docker setup:** |
| 24 | + |
| 25 | + ```bash |
| 26 | + git clone https://github.com/HazyResearch/minions.git |
| 27 | + cd minions/apps/minions-docker |
| 28 | + ``` |
| 29 | + |
| 30 | +2. **Set your OpenAI API key:** |
| 31 | + |
| 32 | + ```bash |
| 33 | + export OPENAI_API_KEY=sk-your-key-here |
| 34 | + ``` |
| 35 | + |
| 36 | +3. **Customize the model for better accuracy (recommended):** |
| 37 | + |
| 38 | + Edit the `docker-compose.minions.yml` file to use qwen3 instead of llama3.2: |
| 39 | + |
| 40 | + ```yaml |
| 41 | + models: |
| 42 | + worker: |
| 43 | + model: ai/qwen3 # Changed from ai/llama3.2 for better accuracy (8B vs 3B params) |
| 44 | + context_size: 10000 |
| 45 | + ``` |
| 46 | + |
| 47 | +4. **Launch the MinionS protocol:** |
| 48 | + |
| 49 | + ```bash |
| 50 | + docker compose -f docker-compose.minions.yml up --build |
| 51 | + ``` |
| 52 | + |
| 53 | +5. **Open your browser** and navigate to `http://localhost:8080` to access the interactive interface. |
| 54 | + |
| 55 | +## 🧠 What is the MinionS Protocol? |
| 56 | + |
| 57 | +The MinionS protocol enables **cost-efficient collaboration** between: |
| 58 | + |
| 59 | ++ **Local Model** (on-device): Handles document reading, context processing, and initial analysis |
| 60 | ++ **Remote Model** (cloud): Provides supervision, final reasoning, and quality assurance |
| 61 | + |
| 62 | +### Key Innovation: Decomposition Strategy |
| 63 | + |
| 64 | +Unlike simple chat protocols, MinionS uses a sophisticated **decompose-execute-aggregate** approach: |
| 65 | + |
| 66 | +1. **Decompose**: Remote model breaks complex tasks into simple, parallel subtasks |
| 67 | +2. **Execute**: Local model processes subtasks in parallel on document chunks |
| 68 | +3. **Aggregate**: Remote model synthesizes results and provides final answers |
| 69 | + |
| 70 | +## 📊 Cost Analysis & Performance |
| 71 | + |
| 72 | +### Academic Research Results |
| 73 | + |
| 74 | +Based on the [Stanford research paper](https://arxiv.org/pdf/2502.15964), MinionS demonstrates: |
| 75 | + |
| 76 | +| Protocol | Cost Reduction | Performance Recovery | Use Case | |
| 77 | +|----------|---------------|---------------------|----------| |
| 78 | +| **MinionS (8B local)** | **5.7× cheaper** | **97.9%** of remote performance | Production ready | |
| 79 | +| **MinionS (3B local)** | **6.0× cheaper** | **93.4%** of remote performance | Resource constrained | |
| 80 | +| Minion (simple chat) | 30.4× cheaper | 87.0% of remote performance | Basic tasks | |
| 81 | + |
| 82 | +### Real-World Token Usage |
| 83 | + |
| 84 | +**Research Paper Analysis Example:** |
| 85 | + |
| 86 | ++ **Task**: "What are the three evaluation datasets used in the paper?" |
| 87 | ++ **Remote-only**: ~30,064 tokens |
| 88 | ++ **MinionS**: ~7,500-15,388 tokens |
| 89 | ++ **Savings**: 50-75% token reduction |
| 90 | + |
| 91 | +## 🎯 Interactive Demo: Compare Remote vs MinionS |
| 92 | + |
| 93 | +The MinionS interface includes a **toggle feature** that lets you compare: |
| 94 | + |
| 95 | +### Remote-Only Mode |
| 96 | + |
| 97 | ++ Processes the entire document with cloud model |
| 98 | ++ Higher token usage and cost |
| 99 | ++ Baseline performance |
| 100 | + |
| 101 | +### MinionS Mode |
| 102 | + |
| 103 | ++ Local model reads and processes document chunks |
| 104 | ++ Remote model provides supervision and final answers |
| 105 | ++ Dramatically reduced cloud costs |
| 106 | ++ Maintained quality |
| 107 | + |
| 108 | +## 🎮 Step-by-Step Demo |
| 109 | + |
| 110 | +### Example: Research Paper Analysis |
| 111 | + |
| 112 | +1. **Start the system** following the Quick Start guide above |
| 113 | + |
| 114 | +2. **Load the MinionS research paper** as your document: |
| 115 | + - Download: <https://arxiv.org/pdf/2502.15964> |
| 116 | + - Upload through the web interface |
| 117 | + |
| 118 | +3. **Ask the example question**: |
| 119 | + |
| 120 | + ```text |
| 121 | + Task: "What are the three evaluation datasets used in the paper?" |
| 122 | + Document Metadata: "Research Paper" |
| 123 | + ``` |
| 124 | + |
| 125 | +4. **Compare modes**: |
| 126 | + - **Remote-only**: Watch token usage (~30k tokens) |
| 127 | + - **MinionS**: See the dramatic reduction (~7.5-15k tokens) |
| 128 | + |
| 129 | +5. **Expected answer**: "The three evaluation datasets are FinanceBench, LongHealth, and QASPER" |
| 130 | + |
| 131 | +### Model Customization |
| 132 | + |
| 133 | +**Recommended**: Upgrade from llama3.2 (3B) to qwen3 (8B) for better accuracy: |
| 134 | + |
| 135 | +```yaml |
| 136 | +# In docker-compose.minions.yml |
| 137 | +models: |
| 138 | + worker: |
| 139 | + model: ai/qwen3 # 8B parameters - better accuracy |
| 140 | + # model: ai/llama3.2 # 3B parameters - faster download |
| 141 | + context_size: 10000 |
| 142 | +``` |
| 143 | + |
| 144 | +**Trade-offs**: |
| 145 | + |
| 146 | ++ **qwen3**: Slightly slower to download, significantly better accuracy |
| 147 | ++ **llama3.2**: Faster to pull, adequate for simple tasks |
| 148 | + |
| 149 | +## 🤝 When to Use MinionS |
| 150 | + |
| 151 | +### ✅ Ideal Use Cases |
| 152 | + |
| 153 | ++ **Document Analysis**: Financial reports, medical records, research papers |
| 154 | ++ **Long Context Tasks**: Multi-page document processing |
| 155 | ++ **Cost-Sensitive Applications**: High-volume document processing |
| 156 | ++ **Privacy-Conscious**: Keep sensitive data local while leveraging cloud intelligence |
| 157 | + |
| 158 | +## 🧹 Cleanup |
| 159 | + |
| 160 | +To stop and remove containers: |
| 161 | + |
| 162 | +```bash |
| 163 | +cd minions/apps/minions-docker |
| 164 | +docker compose -f docker-compose.minions.yml down -v |
| 165 | +``` |
| 166 | + |
| 167 | +## 📚 Additional Resources |
| 168 | + |
| 169 | +### Official MinionS Resources |
| 170 | + |
| 171 | ++ **Research Paper**: [Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models](https://arxiv.org/pdf/2502.15964) |
| 172 | ++ **GitHub Repository**: [HazyResearch/minions](https://github.com/HazyResearch/minions) |
| 173 | ++ **Docker Setup**: [minions-docker](https://github.com/HazyResearch/minions/tree/main/apps/minions-docker) |
| 174 | + |
| 175 | +### Academic Citation |
| 176 | + |
| 177 | +```bibtex |
| 178 | +@article{narayan2025minions, |
| 179 | + title={Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models}, |
| 180 | + author={Narayan, Avanika and Biderman, Dan and Eyuboglu, Sabri and May, Avner and Linderman, Scott and Zou, James and R{\'e}, Christopher}, |
| 181 | + journal={arXiv preprint arXiv:2502.15964}, |
| 182 | + year={2025} |
| 183 | +} |
| 184 | +``` |
| 185 | + |
| 186 | +## 🏆 Key Benefits Summary |
| 187 | + |
| 188 | ++ **💰 Cost Reduction**: 5.7× cheaper than remote-only processing |
| 189 | ++ **🎯 High Accuracy**: Maintains 97.9% of cloud model performance |
| 190 | ++ **🔧 Easy Customization**: Simple model swapping (llama3.2 → qwen3) |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## 📎 Credits |
| 195 | + |
| 196 | ++ **Research**: [Stanford Hazy Research Lab](https://hazyresearch.stanford.edu/) |
| 197 | ++ **Authors**: Avanika Narayan, Dan Biderman, Sabri Eyuboglu, and team |
| 198 | ++ **Implementation**: [HazyResearch/minions](https://github.com/HazyResearch/minions) |
| 199 | ++ **Docker Integration**: Compose for Agents community |
| 200 | + |
| 201 | +[Docker Compose]: https://github.com/docker/compose |
| 202 | +[Docker Desktop]: https://www.docker.com/products/docker-desktop/ |
| 203 | +[Docker Engine]: https://docs.docker.com/engine/ |
| 204 | +[Docker Model Runner requirements]: https://docs.docker.com/ai/model-runner/ |
| 205 | +[Docker Offload]: https://www.docker.com/products/docker-offload/ |
0 commit comments