-
-
Notifications
You must be signed in to change notification settings - Fork 688
Add DeepSeek R1 Qwen3 (8B) - GRPO Model #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add DeepSeek R1 Qwen3 (8B) - GRPO Model #672
Conversation
WalkthroughThe change updates the title in the first markdown cell of a Jupyter notebook, renaming it from "GRPO Agent Demo" to "GRPO Model," and adds a newline at the end of the file. No code, logic, or output content within the notebook has been modified. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @DhivyaBharathy-web, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly expands the examples/cookbooks
directory by introducing five new Jupyter notebooks. These notebooks serve as practical demonstrations for leveraging various large language models, including DeepSeek's Qwen3-8B (though the code uses DeepSeek-MoE-16B-Chat) and several models from the Microsoft Phi series (Phi-3.5 Mini, Phi-3 Medium, Phi-4, and Phi-4 14B). The examples cover general conversational inference and specifically highlight the application of Guided Reasoning Prompt Optimization (GRPO) for enhanced reasoning capabilities in LLM agents.
Highlights
- New DeepSeek GRPO Cookbook: Introduced a new Jupyter notebook (
DeepSeek_Qwen3_GRPO.ipynb
) demonstrating the usage of DeepSeek's Qwen3-8B model with Guided Reasoning Prompt Optimization (GRPO) for interactive conversational reasoning tasks. Note: The code in the notebook currently loadsdeepseek-ai/deepseek-moe-16b-chat
. - New Phi-3.5 Mini Conversational Cookbook: Added a new cookbook (
Phi_3_5_Mini_Conversational.ipynb
) showcasing lightweight conversational inference using the Phi-3.5 Mini model, suitable for smaller hardware and educational use cases. - New Phi-3 Medium Conversational Cookbook: Included a new Jupyter notebook (
Phi_3_Medium_Conversational.ipynb
) demonstrating conversational inference with the Phi-3 Medium model, illustrating basic loading, prompting, and response generation. - New Phi-4 (14B) GRPO Cookbook: Integrated a new cookbook (
Phi_4_14B_GRPO.ipynb
) that demonstrates inference using the Phi-4 14B parameter model with GRPO optimization strategy for conversational tasks. - New Phi-4 Conversational Cookbook: Added a new Jupyter notebook (
Phi_4_Conversational.ipynb
) for basic conversational inference with the Phi-4 model, demonstrating chat-style turn-based communication.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here. PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #672 +/- ##
=======================================
Coverage 14.50% 14.50%
=======================================
Files 25 25
Lines 2517 2517
Branches 357 357
=======================================
Hits 365 365
Misses 2136 2136
Partials 16 16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds several Jupyter notebooks demonstrating various AI models. The notebooks are generally well-structured and provide useful examples. Key areas for improvement include correcting a critical model mismatch in the DeepSeek_Qwen3_GRPO.ipynb
notebook, clarifying the use of GRPO in the Phi_4_14B_GRPO.ipynb
notebook, updating Colab badge URLs, and adding resource consideration notes for larger models.
"prompt:\n", | ||
" task: \"Reasoning over multi-step instructions\"\n", | ||
" context: \"User provides a math problem or logical question.\"\n", | ||
" model: \"deepseek-ai/deepseek-moe-16b-chat\"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model ID deepseek-ai/deepseek-moe-16b-chat
in this YAML example is inconsistent with the notebook's title and description, as well as the PR title, which all refer to an 8B Qwen3 model. Update this to reflect the intended 8B Qwen3 model ID.
model: "your_deepseek_qwen3_8b_grpo_model_id" # Please replace with the correct 8B model ID
"\n", | ||
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n", | ||
"\n", | ||
"model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model_id
used here is "deepseek-ai/deepseek-moe-16b-chat"
, which corresponds to a 16B parameter model. This conflicts with the PR title, notebook title, and notebook description, all of which refer to an 8B model. Update the model_id
to the correct Hugging Face ID for the intended "DeepSeek R1 Qwen3 (8B) - GRPO" model.
model_id = "your_deepseek_qwen3_8b_grpo_model_id" # Please replace with the correct 8B model ID for DeepSeek Qwen3 GRPO
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"[](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL for the 'Open in Colab' badge points to https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/...
. Update the GitHub username in the URL to MervinPraison
to ensure users are directed to the correct repository.
[](https://colab.research.google.com/github/MervinPraison/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)
"source": [ | ||
"**Description:**\n", | ||
"\n", | ||
"This notebook demonstrates inference using the Phi-4 14B parameter model with GRPO optimization strategy." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"!pip install transformers accelerate\n", | ||
"!pip install torch\n", | ||
"```" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
examples/cookbooks/Phi_3_Medium_Conversational.ipynb (1)
45-48
: Same dependency pinning comment applies.
Consider lockingtransformers
,accelerate
,torch
,torchvision
versions.examples/cookbooks/Phi_4_Conversational.ipynb (1)
87-95
: 4-B model GPU safety & no-grad context.
Same remarks ondevice_map
+torch.no_grad()
as above.
🧹 Nitpick comments (5)
examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb (1)
51-55
: Un-pinned dependency versions jeopardise reproducibility.
!pip install -q transformers accelerate
will always pull the latest versions which may introduce breaking API changes. Pin to a known-good minor version (or at minimum a~=
spec).examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb (1)
44-47
: Pin package versions for deterministic installs.
Repeatable notebooks are easier to debug. Suggest pinningtransformers
,accelerate
, andtorch
to tested versions (e.g.,transformers==4.41.1
,torch==2.3.*
).examples/cookbooks/Phi_4_Conversational.ipynb (1)
44-47
: Pin versions to avoid future incompatibilities.examples/cookbooks/Phi_4_14B_GRPO.ipynb (2)
42-47
: Pinned versions critical for 14 B model + GRPO.
Large models depend heavily on matchingtransformers
/accelerate
versions; please lock them.
88-96
: 14 B checkpoint unlikely to fit on free Colab GPU.
Load with 8-bit/4-bit quantisation (BitsAndBytesConfig
), or instruct users to switch to a high-RAM runtime. Failing that, the cell will OOM.-from transformers import AutoTokenizer, AutoModelForCausalLM +from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig @@ -model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4-14b") +bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") +model = AutoModelForCausalLM.from_pretrained( + "microsoft/phi-4-14b", + quantization_config=bnb_config, + device_map="auto", + torch_dtype="auto", +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb
(1 hunks)examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb
(1 hunks)examples/cookbooks/Phi_3_Medium_Conversational.ipynb
(1 hunks)examples/cookbooks/Phi_4_14B_GRPO.ipynb
(1 hunks)examples/cookbooks/Phi_4_Conversational.ipynb
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: quick-test
- GitHub Check: test-examples
- GitHub Check: test-core (3.11)
- GitHub Check: performance-test
- GitHub Check: Run tests and collect coverage
"This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n", | ||
"It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way." | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model mismatch – description says “Qwen3-8B” but code loads “deepseek-moe-16b-chat”.
The notebook title/markdown explicitly introduces a fine-tuned Qwen3-8B GRPO agent, yet the model_id
points at deepseek-ai/deepseek-moe-16b-chat
. This 16 B MoE chat checkpoint is materially different (size, architecture, licensing) and will not reproduce results claimed for Qwen3-8B GRPO.
Either fix the markdown/filename or switch model_id
(and Colab link) to the intended Qwen3-8B GRPO artifact.
- model_id = "deepseek-ai/deepseek-moe-16b-chat"
+ model_id = "deepseek-ai/deepseek-qwen3-8b-grpo" # <- example id, adjust to the actual HF repo
Also applies to: 125-131
🤖 Prompt for AI Agents
In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 30 to 32 and 125 to
131, the notebook description and title mention the Qwen3-8B model, but the code
loads the deepseek-ai/deepseek-moe-16b-chat model instead. To fix this, update
the model_id in the code and any related Colab links to point to the correct
Qwen3-8B GRPO model artifact that matches the description, or alternatively,
adjust the markdown and filename to accurately reflect the deepseek-moe-16b-chat
model being used.
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n", | ||
"\n", | ||
"model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n", | ||
"tokenizer = AutoTokenizer.from_pretrained(model_id)\n", | ||
"model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n", | ||
"\n", | ||
"pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n", | ||
"\n", | ||
"prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n", | ||
"output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n", | ||
"print(\"🧠 Reasoned Output:\", output)\n" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Large-model loading lacks memory-safe settings.
16 B (or even 8 B) checkpoints frequently exceed Colab’s 16 GB GPU. Add torch_dtype=torch.float16
, device_map="auto"
, and consider trust_remote_code=True
if model repo uses custom layers. Wrap generation in torch.no_grad()
to avoid unnecessary gradients.
-from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
@@
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+ model_id,
+ device_map="auto",
+ torch_dtype="auto", # fp16/bf16 if available
+)
@@
-output = pipe(prompt, max_new_tokens=60)[0]['generated_text']
+with torch.no_grad():
+ output = pipe(prompt, max_new_tokens=60, return_full_text=False)[0]["generated_text"]
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 123 to 134, the
model loading and generation code lacks memory-efficient settings for large
models. Fix this by adding torch_dtype=torch.float16 and device_map="auto"
parameters when loading the model, include trust_remote_code=True if the model
uses custom layers, and wrap the text generation call inside a torch.no_grad()
context to prevent gradient computation and reduce memory usage.
"from transformers import AutoTokenizer, AutoModelForCausalLM\n", | ||
"\n", | ||
"tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3.5-mini\")\n", | ||
"model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3.5-mini\")\n", | ||
"\n", | ||
"inputs = tokenizer(\"What is an AI model?\", return_tensors=\"pt\")\n", | ||
"outputs = model.generate(**inputs, max_new_tokens=40)\n", | ||
"print(tokenizer.decode(outputs[0], skip_special_tokens=True))" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add device placement & inference-mode guards.
AutoModelForCausalLM.from_pretrained
defaults to CPU; on GPU Colab you’ll silently OOM when the model is moved during generate
. Use device_map="auto"
and wrap the generation under torch.no_grad()
.
-model = AutoModelForCausalLM.from_pretrained("microsoft/phi-3.5-mini")
+model = AutoModelForCausalLM.from_pretrained(
+ "microsoft/phi-3.5-mini",
+ device_map="auto",
+ torch_dtype="auto",
+)
@@
-outputs = model.generate(**inputs, max_new_tokens=40)
+with torch.no_grad():
+ outputs = model.generate(**inputs, max_new_tokens=40)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"from transformers import AutoTokenizer, AutoModelForCausalLM\n", | |
"\n", | |
"tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3.5-mini\")\n", | |
"model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3.5-mini\")\n", | |
"\n", | |
"inputs = tokenizer(\"What is an AI model?\", return_tensors=\"pt\")\n", | |
"outputs = model.generate(**inputs, max_new_tokens=40)\n", | |
"print(tokenizer.decode(outputs[0], skip_special_tokens=True))" | |
] | |
from transformers import AutoTokenizer, AutoModelForCausalLM | |
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-3.5-mini") | |
model = AutoModelForCausalLM.from_pretrained( | |
"microsoft/phi-3.5-mini", | |
device_map="auto", | |
torch_dtype="auto", | |
) | |
inputs = tokenizer("What is an AI model?", return_tensors="pt") | |
with torch.no_grad(): | |
outputs = model.generate(**inputs, max_new_tokens=40) | |
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
🤖 Prompt for AI Agents
In examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb around lines 87 to 95,
the model loading and generation code lacks device placement and inference mode
guards, which can cause out-of-memory errors on GPU. Fix this by adding
device_map="auto" to the from_pretrained call to automatically place the model
on the appropriate device, and wrap the model.generate call inside a
torch.no_grad() context to disable gradient calculations during inference.
"from transformers import AutoTokenizer, AutoModelForCausalLM\n", | ||
"import torch\n", | ||
"\n", | ||
"model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3-medium\")\n", | ||
"tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3-medium\")\n", | ||
"\n", | ||
"prompt = \"What is the capital of France?\"\n", | ||
"inputs = tokenizer(prompt, return_tensors=\"pt\")\n", | ||
"outputs = model.generate(**inputs, max_new_tokens=20)\n", | ||
"print(tokenizer.decode(outputs[0], skip_special_tokens=True))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Loading 3-B model on GPU without device_map
can exceed VRAM.
Add device_map="auto"
and torch_dtype="auto"
. Also guard with torch.no_grad()
.
🤖 Prompt for AI Agents
In examples/cookbooks/Phi_3_Medium_Conversational.ipynb around lines 88 to 97,
the model loading and generation code does not specify device placement, which
can cause VRAM overflow on GPU. Fix this by adding device_map="auto" and
torch_dtype="auto" parameters to the from_pretrained call to enable automatic
device placement and dtype selection. Also, wrap the model.generate call inside
a torch.no_grad() context to prevent unnecessary gradient computation and reduce
memory usage.
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
17682666 | Triggered | Generic High Entropy Secret | f205e78 | src/praisonai-agents/test_posthog_import.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
User description
Experience reasoning-powered text generation using DeepSeek's fine-tuned Qwen3-8B model with GRPO.
The notebook demonstrates controlled output using structured prompts for assistant-style responses.
A practical setup for building custom LLM agents that handle nuanced dialogue and instruction following.
PR Type
Documentation
Description
• Add DeepSeek R1 Qwen3 GRPO model notebook
• Add four Phi model conversational examples
• Include Colab integration for all notebooks
• Demonstrate various model sizes and capabilities
Changes walkthrough 📝
DeepSeek_Qwen3_GRPO.ipynb
Add DeepSeek Qwen3 GRPO reasoning notebook
examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb
• Creates notebook demonstrating DeepSeek's Qwen3-8B model with GRPO
•
Includes reasoning-powered text generation example
• Shows structured
prompts for assistant-style responses
• Provides Colab integration
badge
Phi_3_5_Mini_Conversational.ipynb
Add Phi-3.5 Mini conversational example
examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb
• Creates lightweight inference example using Phi-3.5 Mini
•
Demonstrates basic conversational AI capabilities
• Includes
dependencies and tool descriptions
• Shows simple question-answer
interaction
Phi_3_Medium_Conversational.ipynb
Add Phi-3 Medium conversational inference
examples/cookbooks/Phi_3_Medium_Conversational.ipynb
• Implements Phi-3 Medium model conversational inference
• Shows
efficient pipeline usage for text generation
• Demonstrates basic
loading and response generation
• Includes geography question example
Phi_4_14B_GRPO.ipynb
Add Phi-4 14B GRPO optimization example
examples/cookbooks/Phi_4_14B_GRPO.ipynb
• Creates Phi-4 14B parameter model with GRPO optimization
•
Demonstrates healthcare AI consultation example
• Shows professional
consultant system prompt usage
• Includes thoughtful AI application
insights
Phi_4_Conversational.ipynb
Add Phi-4 conversational chat example
examples/cookbooks/Phi_4_Conversational.ipynb
• Implements basic Phi-4 conversational chat interface
• Shows
turn-based communication capabilities
• Demonstrates tutor-style
machine learning explanation
• Includes educational content generation
example
Summary by CodeRabbit