-
-
Notifications
You must be signed in to change notification settings - Fork 691
Add DeepSeek R1 Qwen3 (8B) - GRPO Model #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "29d21289", | ||
"metadata": { | ||
"id": "29d21289" | ||
}, | ||
"source": [ | ||
"# DeepSeek R1 Qwen3 (8B) - GRPO Agent Demo" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"[](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n" | ||
], | ||
"metadata": { | ||
"id": "yuOEagMH86WV" | ||
}, | ||
"id": "yuOEagMH86WV" | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0f798657", | ||
"metadata": { | ||
"id": "0f798657" | ||
}, | ||
"source": [ | ||
"This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n", | ||
"It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Model mismatch – description says “Qwen3-8B” but code loads “deepseek-moe-16b-chat”. Either fix the markdown/filename or switch - model_id = "deepseek-ai/deepseek-moe-16b-chat"
+ model_id = "deepseek-ai/deepseek-qwen3-8b-grpo" # <- example id, adjust to the actual HF repo Also applies to: 125-131 🤖 Prompt for AI Agents
|
||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "80f3de9e", | ||
"metadata": { | ||
"id": "80f3de9e" | ||
}, | ||
"source": [ | ||
"## Dependencies" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8d1c7f6c", | ||
"metadata": { | ||
"id": "8d1c7f6c" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install -q transformers accelerate" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "78603e7b", | ||
"metadata": { | ||
"id": "78603e7b" | ||
}, | ||
"source": [ | ||
"## Tools" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "88e97fbc", | ||
"metadata": { | ||
"id": "88e97fbc" | ||
}, | ||
"source": [ | ||
"- `transformers`: For model loading and interaction\n", | ||
"- `AutoModelForCausalLM`, `AutoTokenizer`: Interfaces for DeepSeek's LLM" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "37d9bd54", | ||
"metadata": { | ||
"id": "37d9bd54" | ||
}, | ||
"source": [ | ||
"## YAML Prompt" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "adf5cae5", | ||
"metadata": { | ||
"id": "adf5cae5" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"prompt:\n", | ||
" task: \"Reasoning over multi-step instructions\"\n", | ||
" context: \"User provides a math problem or logical question.\"\n", | ||
" model: \"deepseek-ai/deepseek-moe-16b-chat\"\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The model ID
|
||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6985f60c", | ||
"metadata": { | ||
"id": "6985f60c" | ||
}, | ||
"source": [ | ||
"## Main" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d74bf686", | ||
"metadata": { | ||
"id": "d74bf686" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n", | ||
"\n", | ||
"model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
||
"tokenizer = AutoTokenizer.from_pretrained(model_id)\n", | ||
"model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n", | ||
"\n", | ||
"pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n", | ||
"\n", | ||
"prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n", | ||
"output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n", | ||
"print(\"🧠 Reasoned Output:\", output)\n" | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Large-model loading lacks memory-safe settings. -from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
@@
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+ model_id,
+ device_map="auto",
+ torch_dtype="auto", # fp16/bf16 if available
+)
@@
-output = pipe(prompt, max_new_tokens=60)[0]['generated_text']
+with torch.no_grad():
+ output = pipe(prompt, max_new_tokens=60, return_full_text=False)[0]["generated_text"]
🤖 Prompt for AI Agents
|
||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c856167f", | ||
"metadata": { | ||
"id": "c856167f" | ||
}, | ||
"source": [ | ||
"## Output" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "41039ee8", | ||
"metadata": { | ||
"id": "41039ee8" | ||
}, | ||
"source": [ | ||
"### 🖼️ Output Summary\n", | ||
"\n", | ||
"Prompt: *\"If a train travels 60 miles in 1.5 hours, what is its average speed?\"*\n", | ||
"\n", | ||
"🧠 Output: The model provides a clear reasoning process, such as:\n", | ||
"\n", | ||
"> \"To find the average speed, divide the total distance by total time: 60 / 1.5 = 40 mph.\"\n", | ||
"\n", | ||
"💡 This shows the model's ability to walk through logical steps using GRPO-enhanced reasoning." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL for the 'Open in Colab' badge points to
https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/...
. Update the GitHub username in the URL toMervinPraison
to ensure users are directed to the correct repository.