MervinPraison · Dhivya-Bharathy · Jun 18, 2025 · Jun 18, 2025 · Jun 21, 2025 · gemini-code-assist
diff --git a/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb b/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb
@@ -0,0 +1,172 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "29d21289",
+      "metadata": {
+        "id": "29d21289"
+      },
+      "source": [
+        "# DeepSeek R1 Qwen3 (8B) - GRPO Agent Demo"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n"
+      ],
+      "metadata": {
+        "id": "yuOEagMH86WV"
+      },
+      "id": "yuOEagMH86WV"
+    },
+    {
+      "cell_type": "markdown",
+      "id": "0f798657",
+      "metadata": {
+        "id": "0f798657"
+      },
+      "source": [
+        "This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n",
+        "It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "80f3de9e",
+      "metadata": {
+        "id": "80f3de9e"
+      },
+      "source": [
+        "## Dependencies"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "8d1c7f6c",
+      "metadata": {
+        "id": "8d1c7f6c"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q transformers accelerate"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "78603e7b",
+      "metadata": {
+        "id": "78603e7b"
+      },
+      "source": [
+        "## Tools"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "88e97fbc",
+      "metadata": {
+        "id": "88e97fbc"
+      },
+      "source": [
+        "- `transformers`: For model loading and interaction\n",
+        "- `AutoModelForCausalLM`, `AutoTokenizer`: Interfaces for DeepSeek's LLM"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "37d9bd54",
+      "metadata": {
+        "id": "37d9bd54"
+      },
+      "source": [
+        "## YAML Prompt"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "adf5cae5",
+      "metadata": {
+        "id": "adf5cae5"
+      },
+      "outputs": [],
+      "source": [
+        "\n",
+        "prompt:\n",
+        "  task: \"Reasoning over multi-step instructions\"\n",
+        "  context: \"User provides a math problem or logical question.\"\n",
+        "  model: \"deepseek-ai/deepseek-moe-16b-chat\"\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "6985f60c",
+      "metadata": {
+        "id": "6985f60c"
+      },
+      "source": [
+        "## Main"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "d74bf686",
+      "metadata": {
+        "id": "d74bf686"
+      },
+      "outputs": [],
+      "source": [
+        "\n",
+        "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
+        "\n",
+        "model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
+        "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+        "model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n",
+        "\n",
+        "pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n",
+        "\n",
+        "prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n",
+        "output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n",
+        "print(\"🧠 Reasoned Output:\", output)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "c856167f",
+      "metadata": {
+        "id": "c856167f"
+      },
+      "source": [
+        "## Output"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "41039ee8",
+      "metadata": {
+        "id": "41039ee8"
+      },
+      "source": [
+        "### 🖼️ Output Summary\n",
+        "\n",
+        "Prompt: *\"If a train travels 60 miles in 1.5 hours, what is its average speed?\"*\n",
+        "\n",
+        "🧠 Output: The model provides a clear reasoning process, such as:\n",
+        "\n",
+        "> \"To find the average speed, divide the total distance by total time: 60 / 1.5 = 40 mph.\"\n",
+        "\n",
+        "💡 This shows the model's ability to walk through logical steps using GRPO-enhanced reasoning."
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}