Skip to content

Add DeepSeek R1 Qwen3 (8B) - GRPO Model #672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "29d21289",
"metadata": {
"id": "29d21289"
},
"source": [
"# DeepSeek R1 Qwen3 (8B) - GRPO Agent Demo"
]
},
{
"cell_type": "markdown",
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The URL for the 'Open in Colab' badge points to https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/.... Update the GitHub username in the URL to MervinPraison to ensure users are directed to the correct repository.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MervinPraison/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)

],
"metadata": {
"id": "yuOEagMH86WV"
},
"id": "yuOEagMH86WV"
},
{
"cell_type": "markdown",
"id": "0f798657",
"metadata": {
"id": "0f798657"
},
"source": [
"This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n",
"It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way."
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Model mismatch – description says “Qwen3-8B” but code loads “deepseek-moe-16b-chat”.
The notebook title/markdown explicitly introduces a fine-tuned Qwen3-8B GRPO agent, yet the model_id points at deepseek-ai/deepseek-moe-16b-chat. This 16 B MoE chat checkpoint is materially different (size, architecture, licensing) and will not reproduce results claimed for Qwen3-8B GRPO.

Either fix the markdown/filename or switch model_id (and Colab link) to the intended Qwen3-8B GRPO artifact.

- model_id = "deepseek-ai/deepseek-moe-16b-chat"
+ model_id = "deepseek-ai/deepseek-qwen3-8b-grpo"  # <- example id, adjust to the actual HF repo

Also applies to: 125-131

🤖 Prompt for AI Agents
In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 30 to 32 and 125 to
131, the notebook description and title mention the Qwen3-8B model, but the code
loads the deepseek-ai/deepseek-moe-16b-chat model instead. To fix this, update
the model_id in the code and any related Colab links to point to the correct
Qwen3-8B GRPO model artifact that matches the description, or alternatively,
adjust the markdown and filename to accurately reflect the deepseek-moe-16b-chat
model being used.

},
{
"cell_type": "markdown",
"id": "80f3de9e",
"metadata": {
"id": "80f3de9e"
},
"source": [
"## Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d1c7f6c",
"metadata": {
"id": "8d1c7f6c"
},
"outputs": [],
"source": [
"!pip install -q transformers accelerate"
]
},
{
"cell_type": "markdown",
"id": "78603e7b",
"metadata": {
"id": "78603e7b"
},
"source": [
"## Tools"
]
},
{
"cell_type": "markdown",
"id": "88e97fbc",
"metadata": {
"id": "88e97fbc"
},
"source": [
"- `transformers`: For model loading and interaction\n",
"- `AutoModelForCausalLM`, `AutoTokenizer`: Interfaces for DeepSeek's LLM"
]
},
{
"cell_type": "markdown",
"id": "37d9bd54",
"metadata": {
"id": "37d9bd54"
},
"source": [
"## YAML Prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adf5cae5",
"metadata": {
"id": "adf5cae5"
},
"outputs": [],
"source": [
"\n",
"prompt:\n",
" task: \"Reasoning over multi-step instructions\"\n",
" context: \"User provides a math problem or logical question.\"\n",
" model: \"deepseek-ai/deepseek-moe-16b-chat\"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The model ID deepseek-ai/deepseek-moe-16b-chat in this YAML example is inconsistent with the notebook's title and description, as well as the PR title, which all refer to an 8B Qwen3 model. Update this to reflect the intended 8B Qwen3 model ID.

  model: "your_deepseek_qwen3_8b_grpo_model_id" # Please replace with the correct 8B model ID

]
},
{
"cell_type": "markdown",
"id": "6985f60c",
"metadata": {
"id": "6985f60c"
},
"source": [
"## Main"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d74bf686",
"metadata": {
"id": "d74bf686"
},
"outputs": [],
"source": [
"\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
"\n",
"model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The model_id used here is "deepseek-ai/deepseek-moe-16b-chat", which corresponds to a 16B parameter model. This conflicts with the PR title, notebook title, and notebook description, all of which refer to an 8B model. Update the model_id to the correct Hugging Face ID for the intended "DeepSeek R1 Qwen3 (8B) - GRPO" model.

model_id = "your_deepseek_qwen3_8b_grpo_model_id"  # Please replace with the correct 8B model ID for DeepSeek Qwen3 GRPO

"tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
"model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n",
"\n",
"pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n",
"\n",
"prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n",
"output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n",
"print(\"🧠 Reasoned Output:\", output)\n"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Large-model loading lacks memory-safe settings.
16 B (or even 8 B) checkpoints frequently exceed Colab’s 16 GB GPU. Add torch_dtype=torch.float16, device_map="auto", and consider trust_remote_code=True if model repo uses custom layers. Wrap generation in torch.no_grad() to avoid unnecessary gradients.

-from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
@@
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype="auto",  # fp16/bf16 if available
+)
@@
-output = pipe(prompt, max_new_tokens=60)[0]['generated_text']
+with torch.no_grad():
+    output = pipe(prompt, max_new_tokens=60, return_full_text=False)[0]["generated_text"]

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 123 to 134, the
model loading and generation code lacks memory-efficient settings for large
models. Fix this by adding torch_dtype=torch.float16 and device_map="auto"
parameters when loading the model, include trust_remote_code=True if the model
uses custom layers, and wrap the text generation call inside a torch.no_grad()
context to prevent gradient computation and reduce memory usage.

},
{
"cell_type": "markdown",
"id": "c856167f",
"metadata": {
"id": "c856167f"
},
"source": [
"## Output"
]
},
{
"cell_type": "markdown",
"id": "41039ee8",
"metadata": {
"id": "41039ee8"
},
"source": [
"### 🖼️ Output Summary\n",
"\n",
"Prompt: *\"If a train travels 60 miles in 1.5 hours, what is its average speed?\"*\n",
"\n",
"🧠 Output: The model provides a clear reasoning process, such as:\n",
"\n",
"> \"To find the average speed, divide the total distance by total time: 60 / 1.5 = 40 mph.\"\n",
"\n",
"💡 This shows the model's ability to walk through logical steps using GRPO-enhanced reasoning."
]
}
],
"metadata": {
"colab": {
"provenance": []
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading