Skip to content

Add DeepSeek R1 Qwen3 (8B) - GRPO Model #672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Dhivya-Bharathy
Copy link
Contributor

@Dhivya-Bharathy Dhivya-Bharathy commented Jun 18, 2025

User description

Experience reasoning-powered text generation using DeepSeek's fine-tuned Qwen3-8B model with GRPO.
The notebook demonstrates controlled output using structured prompts for assistant-style responses.
A practical setup for building custom LLM agents that handle nuanced dialogue and instruction following.


PR Type

Documentation


Description

• Add DeepSeek R1 Qwen3 GRPO model notebook
• Add four Phi model conversational examples
• Include Colab integration for all notebooks
• Demonstrate various model sizes and capabilities


Changes walkthrough 📝

Relevant files
Documentation
DeepSeek_Qwen3_GRPO.ipynb
Add DeepSeek Qwen3 GRPO reasoning notebook                             

examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb

• Creates notebook demonstrating DeepSeek's Qwen3-8B model with GRPO

Includes reasoning-powered text generation example
• Shows structured
prompts for assistant-style responses
• Provides Colab integration
badge

+172/-0 
Phi_3_5_Mini_Conversational.ipynb
Add Phi-3.5 Mini conversational example                                   

examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb

• Creates lightweight inference example using Phi-3.5 Mini

Demonstrates basic conversational AI capabilities
• Includes
dependencies and tool descriptions
• Shows simple question-answer
interaction

+120/-0 
Phi_3_Medium_Conversational.ipynb
Add Phi-3 Medium conversational inference                               

examples/cookbooks/Phi_3_Medium_Conversational.ipynb

• Implements Phi-3 Medium model conversational inference
• Shows
efficient pipeline usage for text generation
• Demonstrates basic
loading and response generation
• Includes geography question example

+122/-0 
Phi_4_14B_GRPO.ipynb
Add Phi-4 14B GRPO optimization example                                   

examples/cookbooks/Phi_4_14B_GRPO.ipynb

• Creates Phi-4 14B parameter model with GRPO optimization

Demonstrates healthcare AI consultation example
• Shows professional
consultant system prompt usage
• Includes thoughtful AI application
insights

+123/-0 
Phi_4_Conversational.ipynb
Add Phi-4 conversational chat example                                       

examples/cookbooks/Phi_4_Conversational.ipynb

• Implements basic Phi-4 conversational chat interface
• Shows
turn-based communication capabilities
• Demonstrates tutor-style
machine learning explanation
• Includes educational content generation
example

+124/-0 

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • Documentation
      • Updated the notebook title in the first markdown cell for improved clarity.

    Copy link
    Contributor

    coderabbitai bot commented Jun 18, 2025

    Walkthrough

    The change updates the title in the first markdown cell of a Jupyter notebook, renaming it from "GRPO Agent Demo" to "GRPO Model," and adds a newline at the end of the file. No code, logic, or output content within the notebook has been modified.

    Changes

    File(s) Change Summary
    examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb Updated notebook title in the first markdown cell; added a newline at end of file

    Poem

    A title refreshed, a subtle tweak,
    The notebook’s name now less mystique.
    No code was touched, just words anew,
    A gentle change, as rabbits do.
    With every hop, we tidy more—
    Small edits open learning’s door!
    🐇✨


    📜 Recent review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between f4a637f and f205e78.

    📒 Files selected for processing (1)
    • examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb (2 hunks)
    ✅ Files skipped from review due to trivial changes (1)
    • examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb
    ⏰ Context from checks skipped due to timeout of 90000ms (5)
    • GitHub Check: test-core (3.11)
    • GitHub Check: performance-test
    • GitHub Check: test-examples
    • GitHub Check: quick-test
    • GitHub Check: Run tests and collect coverage

    Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Explain this complex logic.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai explain this code block.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and explain its main purpose.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Support

    Need help? Create a ticket on our support page for assistance with any issues or questions.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR.
    • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link
    Contributor

    @gemini-code-assist gemini-code-assist bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Summary of Changes

    Hello @DhivyaBharathy-web, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

    This pull request significantly expands the examples/cookbooks directory by introducing five new Jupyter notebooks. These notebooks serve as practical demonstrations for leveraging various large language models, including DeepSeek's Qwen3-8B (though the code uses DeepSeek-MoE-16B-Chat) and several models from the Microsoft Phi series (Phi-3.5 Mini, Phi-3 Medium, Phi-4, and Phi-4 14B). The examples cover general conversational inference and specifically highlight the application of Guided Reasoning Prompt Optimization (GRPO) for enhanced reasoning capabilities in LLM agents.

    Highlights

    • New DeepSeek GRPO Cookbook: Introduced a new Jupyter notebook (DeepSeek_Qwen3_GRPO.ipynb) demonstrating the usage of DeepSeek's Qwen3-8B model with Guided Reasoning Prompt Optimization (GRPO) for interactive conversational reasoning tasks. Note: The code in the notebook currently loads deepseek-ai/deepseek-moe-16b-chat.
    • New Phi-3.5 Mini Conversational Cookbook: Added a new cookbook (Phi_3_5_Mini_Conversational.ipynb) showcasing lightweight conversational inference using the Phi-3.5 Mini model, suitable for smaller hardware and educational use cases.
    • New Phi-3 Medium Conversational Cookbook: Included a new Jupyter notebook (Phi_3_Medium_Conversational.ipynb) demonstrating conversational inference with the Phi-3 Medium model, illustrating basic loading, prompting, and response generation.
    • New Phi-4 (14B) GRPO Cookbook: Integrated a new cookbook (Phi_4_14B_GRPO.ipynb) that demonstrates inference using the Phi-4 14B parameter model with GRPO optimization strategy for conversational tasks.
    • New Phi-4 Conversational Cookbook: Added a new Jupyter notebook (Phi_4_Conversational.ipynb) for basic conversational inference with the Phi-4 model, demonstrating chat-style turn-based communication.
    Using Gemini Code Assist

    The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

    Invoking Gemini

    You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

    Feature Command Description
    Code Review /gemini review Performs a code review for the current pull request in its current state.
    Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
    Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
    Help /gemini help Displays a list of available commands.

    Customization

    To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

    Limitations & Feedback

    Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

    You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

    Footnotes

    1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

    Copy link

    You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Model Mismatch

    The notebook title and description reference "DeepSeek R1 Qwen3 (8B)" but the actual model used in the code is "deepseek-ai/deepseek-moe-16b-chat" which is a 16B MoE model, not the 8B Qwen3 model mentioned.

    "model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    
    Invalid Code

    The YAML prompt section contains invalid Python syntax with unquoted YAML content that will cause execution errors when run as a code cell.

      "\n",
      "prompt:\n",
      "  task: \"Reasoning over multi-step instructions\"\n",
      "  context: \"User provides a math problem or logical question.\"\n",
      "  model: \"deepseek-ai/deepseek-moe-16b-chat\"\n"
    ]
    
    Missing Model

    The notebook references "microsoft/phi-4-14b" model which may not exist on HuggingFace Hub. The actual Phi-4 model identifier should be verified for availability.

    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-4-14b\")\n",
    "model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-4-14b\")\n",
    

    Copy link

    qodo-merge-pro bot commented Jun 18, 2025

    You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix invalid YAML syntax

    The YAML prompt section contains invalid Python syntax and should be properly
    formatted as a string or comment. This code block will cause syntax errors when
    executed.

    examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb [97-100]

    -prompt:
    -  task: "Reasoning over multi-step instructions"
    -  context: "User provides a math problem or logical question."
    -  model: "deepseek-ai/deepseek-moe-16b-chat"
    +# YAML Prompt Configuration
    +# task: "Reasoning over multi-step instructions"
    +# context: "User provides a math problem or logical question."
    +# model: "deepseek-ai/deepseek-moe-16b-chat"

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies that the YAML content is in a code cell, which is invalid Python syntax and would cause a SyntaxError. The proposed fix of commenting out the lines resolves the issue, making the notebook runnable.

    High
    Verify model identifier exists

    The model ID "microsoft/phi-4-14b" may not exist on HuggingFace Hub. Verify the
    correct model identifier or use an available Phi-4 variant to prevent runtime
    errors.

    examples/cookbooks/Phi_4_14B_GRPO.ipynb [90-91]

    -tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-4-14b")
    -model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4-14b")
    +tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-medium-4k-instruct")  # Use verified model ID
    +model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-medium-4k-instruct")

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies that the model microsoft/phi-4-14b is not a valid HuggingFace model identifier, which would cause a runtime error. While the proposed code uses a different model, the core suggestion to verify and use a correct model ID is critical for the notebook to function.

    High
    General
    Correct model ID mismatch

    The model ID references a different model than mentioned in the title and
    description. The notebook claims to demonstrate "DeepSeek R1 Qwen3 (8B)" but
    uses "deepseek-moe-16b-chat" which is inconsistent.

    examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb [125-127]

    -model_id = "deepseek-ai/deepseek-moe-16b-chat"
    +model_id = "deepseek-ai/deepseek-r1-qwen3-8b"  # or appropriate DeepSeek R1 Qwen3 model ID
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly points out a significant inconsistency. The notebook's title and description refer to the "Qwen3 (8B)" model, but the code implements the "deepseek-moe-16b-chat" model. This is misleading and makes the example incorrect.

    High
    • Update

    Copy link

    codecov bot commented Jun 18, 2025

    Codecov Report

    All modified and coverable lines are covered by tests ✅

    Project coverage is 14.50%. Comparing base (8ee013e) to head (f205e78).
    Report is 147 commits behind head on main.

    Additional details and impacted files
    @@           Coverage Diff           @@
    ##             main     #672   +/-   ##
    =======================================
      Coverage   14.50%   14.50%           
    =======================================
      Files          25       25           
      Lines        2517     2517           
      Branches      357      357           
    =======================================
      Hits          365      365           
      Misses       2136     2136           
      Partials       16       16           
    Flag Coverage Δ
    quick-validation 0.00% <ø> (ø)
    unit-tests 14.50% <ø> (ø)

    Flags with carried forward coverage won't be shown. Click here to find out more.

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    🚀 New features to boost your workflow:
    • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
    • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

    Copy link
    Contributor

    @gemini-code-assist gemini-code-assist bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Code Review

    This pull request adds several Jupyter notebooks demonstrating various AI models. The notebooks are generally well-structured and provide useful examples. Key areas for improvement include correcting a critical model mismatch in the DeepSeek_Qwen3_GRPO.ipynb notebook, clarifying the use of GRPO in the Phi_4_14B_GRPO.ipynb notebook, updating Colab badge URLs, and adding resource consideration notes for larger models.

    "prompt:\n",
    " task: \"Reasoning over multi-step instructions\"\n",
    " context: \"User provides a math problem or logical question.\"\n",
    " model: \"deepseek-ai/deepseek-moe-16b-chat\"\n"
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    critical

    The model ID deepseek-ai/deepseek-moe-16b-chat in this YAML example is inconsistent with the notebook's title and description, as well as the PR title, which all refer to an 8B Qwen3 model. Update this to reflect the intended 8B Qwen3 model ID.

      model: "your_deepseek_qwen3_8b_grpo_model_id" # Please replace with the correct 8B model ID
    

    "\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
    "\n",
    "model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    critical

    The model_id used here is "deepseek-ai/deepseek-moe-16b-chat", which corresponds to a 16B parameter model. This conflicts with the PR title, notebook title, and notebook description, all of which refer to an 8B model. Update the model_id to the correct Hugging Face ID for the intended "DeepSeek R1 Qwen3 (8B) - GRPO" model.

    model_id = "your_deepseek_qwen3_8b_grpo_model_id"  # Please replace with the correct 8B model ID for DeepSeek Qwen3 GRPO
    

    {
    "cell_type": "markdown",
    "source": [
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n"
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    The URL for the 'Open in Colab' badge points to https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/.... Update the GitHub username in the URL to MervinPraison to ensure users are directed to the correct repository.

    [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MervinPraison/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)
    

    "source": [
    "**Description:**\n",
    "\n",
    "This notebook demonstrates inference using the Phi-4 14B parameter model with GRPO optimization strategy."
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    The description mentions "GRPO optimization strategy," but the code doesn't clearly demonstrate how GRPO is applied. Clarify if GRPO is inherent to the model or if a specific technique is needed.

    "!pip install transformers accelerate\n",
    "!pip install torch\n",
    "```"
    ]
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    The Phi-4 14B model is resource-intensive. Add a markdown cell after the dependencies installation to inform users about potential resource requirements (GPU, RAM).

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 4

    ♻️ Duplicate comments (2)
    examples/cookbooks/Phi_3_Medium_Conversational.ipynb (1)

    45-48: Same dependency pinning comment applies.
    Consider locking transformers, accelerate, torch, torchvision versions.

    examples/cookbooks/Phi_4_Conversational.ipynb (1)

    87-95: 4-B model GPU safety & no-grad context.
    Same remarks on device_map + torch.no_grad() as above.

    🧹 Nitpick comments (5)
    examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb (1)

    51-55: Un-pinned dependency versions jeopardise reproducibility.
    !pip install -q transformers accelerate will always pull the latest versions which may introduce breaking API changes. Pin to a known-good minor version (or at minimum a ~= spec).

    examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb (1)

    44-47: Pin package versions for deterministic installs.
    Repeatable notebooks are easier to debug. Suggest pinning transformers, accelerate, and torch to tested versions (e.g., transformers==4.41.1, torch==2.3.*).

    examples/cookbooks/Phi_4_Conversational.ipynb (1)

    44-47: Pin versions to avoid future incompatibilities.

    examples/cookbooks/Phi_4_14B_GRPO.ipynb (2)

    42-47: Pinned versions critical for 14 B model + GRPO.
    Large models depend heavily on matching transformers / accelerate versions; please lock them.


    88-96: 14 B checkpoint unlikely to fit on free Colab GPU.
    Load with 8-bit/4-bit quantisation (BitsAndBytesConfig), or instruct users to switch to a high-RAM runtime. Failing that, the cell will OOM.

    -from transformers import AutoTokenizer, AutoModelForCausalLM
    +from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
    @@
    -model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4-14b")
    +bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
    +model = AutoModelForCausalLM.from_pretrained(
    +    "microsoft/phi-4-14b",
    +    quantization_config=bnb_config,
    +    device_map="auto",
    +    torch_dtype="auto",
    +)
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between db28e0a and fe63b97.

    📒 Files selected for processing (5)
    • examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb (1 hunks)
    • examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb (1 hunks)
    • examples/cookbooks/Phi_3_Medium_Conversational.ipynb (1 hunks)
    • examples/cookbooks/Phi_4_14B_GRPO.ipynb (1 hunks)
    • examples/cookbooks/Phi_4_Conversational.ipynb (1 hunks)
    ⏰ Context from checks skipped due to timeout of 90000ms (5)
    • GitHub Check: quick-test
    • GitHub Check: test-examples
    • GitHub Check: test-core (3.11)
    • GitHub Check: performance-test
    • GitHub Check: Run tests and collect coverage

    Comment on lines 30 to 32
    "This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n",
    "It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way."
    ]
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Model mismatch – description says “Qwen3-8B” but code loads “deepseek-moe-16b-chat”.
    The notebook title/markdown explicitly introduces a fine-tuned Qwen3-8B GRPO agent, yet the model_id points at deepseek-ai/deepseek-moe-16b-chat. This 16 B MoE chat checkpoint is materially different (size, architecture, licensing) and will not reproduce results claimed for Qwen3-8B GRPO.

    Either fix the markdown/filename or switch model_id (and Colab link) to the intended Qwen3-8B GRPO artifact.

    - model_id = "deepseek-ai/deepseek-moe-16b-chat"
    + model_id = "deepseek-ai/deepseek-qwen3-8b-grpo"  # <- example id, adjust to the actual HF repo

    Also applies to: 125-131

    🤖 Prompt for AI Agents
    In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 30 to 32 and 125 to
    131, the notebook description and title mention the Qwen3-8B model, but the code
    loads the deepseek-ai/deepseek-moe-16b-chat model instead. To fix this, update
    the model_id in the code and any related Colab links to point to the correct
    Qwen3-8B GRPO model artifact that matches the description, or alternatively,
    adjust the markdown and filename to accurately reflect the deepseek-moe-16b-chat
    model being used.
    

    Comment on lines 123 to 134
    "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
    "\n",
    "model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n",
    "\n",
    "pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n",
    "\n",
    "prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n",
    "output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n",
    "print(\"🧠 Reasoned Output:\", output)\n"
    ]
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Large-model loading lacks memory-safe settings.
    16 B (or even 8 B) checkpoints frequently exceed Colab’s 16 GB GPU. Add torch_dtype=torch.float16, device_map="auto", and consider trust_remote_code=True if model repo uses custom layers. Wrap generation in torch.no_grad() to avoid unnecessary gradients.

    -from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    +from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
    @@
    -tokenizer = AutoTokenizer.from_pretrained(model_id)
    -model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
    +tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
    +model = AutoModelForCausalLM.from_pretrained(
    +    model_id,
    +    device_map="auto",
    +    torch_dtype="auto",  # fp16/bf16 if available
    +)
    @@
    -output = pipe(prompt, max_new_tokens=60)[0]['generated_text']
    +with torch.no_grad():
    +    output = pipe(prompt, max_new_tokens=60, return_full_text=False)[0]["generated_text"]

    Committable suggestion skipped: line range outside the PR's diff.

    🤖 Prompt for AI Agents
    In examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb around lines 123 to 134, the
    model loading and generation code lacks memory-efficient settings for large
    models. Fix this by adding torch_dtype=torch.float16 and device_map="auto"
    parameters when loading the model, include trust_remote_code=True if the model
    uses custom layers, and wrap the text generation call inside a torch.no_grad()
    context to prevent gradient computation and reduce memory usage.
    

    Comment on lines 87 to 95
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3.5-mini\")\n",
    "model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3.5-mini\")\n",
    "\n",
    "inputs = tokenizer(\"What is an AI model?\", return_tensors=\"pt\")\n",
    "outputs = model.generate(**inputs, max_new_tokens=40)\n",
    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
    ]
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Add device placement & inference-mode guards.
    AutoModelForCausalLM.from_pretrained defaults to CPU; on GPU Colab you’ll silently OOM when the model is moved during generate. Use device_map="auto" and wrap the generation under torch.no_grad().

    -model = AutoModelForCausalLM.from_pretrained("microsoft/phi-3.5-mini")
    +model = AutoModelForCausalLM.from_pretrained(
    +    "microsoft/phi-3.5-mini",
    +    device_map="auto",
    +    torch_dtype="auto",
    +)
    @@
    -outputs = model.generate(**inputs, max_new_tokens=40)
    +with torch.no_grad():
    +    outputs = model.generate(**inputs, max_new_tokens=40)
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3.5-mini\")\n",
    "model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3.5-mini\")\n",
    "\n",
    "inputs = tokenizer(\"What is an AI model?\", return_tensors=\"pt\")\n",
    "outputs = model.generate(**inputs, max_new_tokens=40)\n",
    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
    ]
    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-3.5-mini")
    model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-3.5-mini",
    device_map="auto",
    torch_dtype="auto",
    )
    inputs = tokenizer("What is an AI model?", return_tensors="pt")
    with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=40)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    🤖 Prompt for AI Agents
    In examples/cookbooks/Phi_3_5_Mini_Conversational.ipynb around lines 87 to 95,
    the model loading and generation code lacks device placement and inference mode
    guards, which can cause out-of-memory errors on GPU. Fix this by adding
    device_map="auto" to the from_pretrained call to automatically place the model
    on the appropriate device, and wrap the model.generate call inside a
    torch.no_grad() context to disable gradient calculations during inference.
    

    Comment on lines 88 to 97
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "import torch\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\"microsoft/phi-3-medium\")\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/phi-3-medium\")\n",
    "\n",
    "prompt = \"What is the capital of France?\"\n",
    "inputs = tokenizer(prompt, return_tensors=\"pt\")\n",
    "outputs = model.generate(**inputs, max_new_tokens=20)\n",
    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Loading 3-B model on GPU without device_map can exceed VRAM.
    Add device_map="auto" and torch_dtype="auto". Also guard with torch.no_grad().

    🤖 Prompt for AI Agents
    In examples/cookbooks/Phi_3_Medium_Conversational.ipynb around lines 88 to 97,
    the model loading and generation code does not specify device placement, which
    can cause VRAM overflow on GPU. Fix this by adding device_map="auto" and
    torch_dtype="auto" parameters to the from_pretrained call to enable automatic
    device placement and dtype selection. Also, wrap the model.generate call inside
    a torch.no_grad() context to prevent unnecessary gradient computation and reduce
    memory usage.
    

    Copy link

    gitguardian bot commented Jun 21, 2025

    ⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

    Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

    Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
    Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

    🔎 Detected hardcoded secret in your pull request
    GitGuardian id GitGuardian status Secret Commit Filename
    17682666 Triggered Generic High Entropy Secret f205e78 src/praisonai-agents/test_posthog_import.py View secret
    🛠 Guidelines to remediate hardcoded secrets
    1. Understand the implications of revoking this secret by investigating where it is used in your code.
    2. Replace and store your secret safely. Learn here the best practices.
    3. Revoke and rotate this secret.
    4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

    To avoid such incidents in the future consider


    🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant