-
-
Notifications
You must be signed in to change notification settings - Fork 689
Add MemoryPal Search Agent Notebook #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add MemoryPal Search Agent Notebook #695
Conversation
WalkthroughThree new Jupyter notebook examples have been added, each demonstrating an AI-powered agent for a specific use case: a Chilean government services assistant using Firecrawl and translation, a PraisonAI agent with DuckDuckGo search integration for job trends, and a cybersecurity agent for automated CVE PoC validation. Each notebook defines new classes and functions to support their respective workflows. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Chatbot (Chile Assistant)
participant FirecrawlTool
participant GoogleTranslator
User->>Chatbot (Chile Assistant): Enter question (in English)
Chatbot (Chile Assistant)->>GoogleTranslator: Translate to Spanish
Chatbot (Chile Assistant)->>FirecrawlTool: Search Chilean gov services (in Spanish)
FirecrawlTool-->>Chatbot (Chile Assistant): Return results (in Spanish)
Chatbot (Chile Assistant)->>GoogleTranslator: Translate results to English
GoogleTranslator-->>Chatbot (Chile Assistant): Translated results
Chatbot (Chile Assistant)-->>User: Display answer (in English)
sequenceDiagram
participant User
participant PraisonAI Agent
participant DuckDuckGo Search Tool
User->>PraisonAI Agent: Query (e.g., "AI job trends in 2025")
PraisonAI Agent->>DuckDuckGo Search Tool: Search query
DuckDuckGo Search Tool-->>PraisonAI Agent: Search results
PraisonAI Agent-->>User: Summarized response with references
sequenceDiagram
participant User
participant run_pocky_for_cve
participant AttackIntentAgent
participant ValidationAgent
User->>run_pocky_for_cve: Provide CVE ID
run_pocky_for_cve->>AttackIntentAgent: Extract attack intent from CVE description
AttackIntentAgent-->>run_pocky_for_cve: Attack intent
run_pocky_for_cve->>ValidationAgent: Validate PoC sample
ValidationAgent-->>run_pocky_for_cve: Validation result
run_pocky_for_cve-->>User: Display validation outcome
Suggested labels
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Dhivya-Bharathy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces three new Google Colab notebooks to the examples/cookbooks
directory. These notebooks demonstrate the capabilities of PraisonAI agents in different scenarios: assisting with Chilean government services, performing internet searches, and validating cybersecurity Proof of Concept (PoC) exploits. Each notebook provides a ready-to-use example with clear instructions and dependencies.
Highlights
- New Cookbook: Chile Government Services Assistant: Adds a Google Colab notebook demonstrating an AI chatbot for answering questions about Chilean government services, utilizing the Firecrawl API and a conversational approach. It includes translation to Spanish for querying and back to English for the user.
- New Cookbook: MemoryPal Search Agent: Introduces a Google Colab notebook showcasing PraisonAI's Agent with a custom internet search tool (DuckDuckGo). It provides a minimal example for real-time web search integration.
- New Cookbook: Pocky Cybersecurity PoC Agent: Adds a Google Colab notebook for automated CVE PoC search and validation. It simulates fetching descriptions and PoCs, extracting attack intents, and validating PoC samples.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #695 +/- ##
=======================================
Coverage 14.50% 14.50%
=======================================
Files 25 25
Lines 2517 2517
Branches 357 357
=======================================
Hits 365 365
Misses 2136 2136
Partials 16 16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces three new Google Colab notebooks: a Chilean government services chatbot, a MemoryPal search agent, and a Pocky Cybersecurity PoC agent. The notebooks provide clear, step-by-step examples of integrating AI agents with external tools like Firecrawl and DuckDuckGo search. My review focuses on improving efficiency, robustness, and maintainability, particularly concerning API key handling, object instantiation, and error checking mechanisms.
" spanish_answer = firecrawl_tool.search(spanish_query)\n", | ||
"\n", | ||
" # Only translate if we got a real answer\n", | ||
" if spanish_answer and isinstance(spanish_answer, str) and spanish_answer.strip() and \"Error\" not in spanish_answer:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for errors by looking for the string "Error" not in spanish_answer
is brittle. If the firecrawl_tool.search
method's error message changes or if valid content happens to contain the word "Error", this logic could break. A more robust approach would be for firecrawl_tool.search
to raise a specific exception on failure, or return a distinct error object/enum, which can then be handled explicitly.
if spanish_answer and isinstance(spanish_answer, str) and spanish_answer.strip() and not spanish_answer.startswith("Error:"):
"id": "rW8ltqCICV8o" | ||
}, | ||
"outputs": [], | ||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"os.environ['FIRECRAWL_API_KEY'] = \"your api key here\"\n", | ||
"os.environ['OPENAI_API_KEY'] = \"your api key here\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding API keys directly in the notebook is generally discouraged, as it can lead to accidental exposure if the notebook is shared or committed to a public repository. While this is a common pattern for Colab notebooks, for better security and user experience, consider using input()
or getpass
to prompt the user for the key, or loading it from a .env
file using python-dotenv
(which is already installed).
os.environ['FIRECRAWL_API_KEY'] = input("Enter your Firecrawl API key: ")
os.environ['OPENAI_API_KEY'] = input("Enter your OpenAI API key: ")
" try:\n", | ||
" return GoogleTranslator(source='auto', target='es').translate(text)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new GoogleTranslator
instance inside the translate_to_spanish
function on every call can be inefficient, especially if this function is called frequently. It's more efficient to create the translator instance once (e.g., globally or as a class member) and reuse it.
translator_es = GoogleTranslator(source='auto', target='es')
def translate_to_spanish(text):
try:
return translator_es.translate(text)
"def translate_to_english(text):\n", | ||
" try:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new GoogleTranslator
instance inside the translate_to_english
function on every call can be inefficient, especially if this function is called frequently. It's more efficient to create the translator instance once (e.g., globally or as a class member) and reuse it.
translator_en = GoogleTranslator(source='auto', target='en')
def translate_to_english(text):
try:
# Remove Markdown images and None values before translation
text = str(text).replace("None", "")
text = re.sub(r'!\[.*?\]\\(.*?\\)', '', text)
return translator_en.translate(text)
" lang=\"es\", # Always search in Spanish for best results\n", | ||
" scrape_options=ScrapeOptions(formats=[\"markdown\", \"links\"])\n", | ||
" )\n", | ||
" if search_result and hasattr(search_result, 'data') and search_result.data:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check hasattr(search_result, 'data')
is often redundant if search_result
is expected to be an object with a data
attribute or None
. In Python, if search_result and search_result.data:
would typically suffice, as accessing search_result.data
on a None
object would raise an AttributeError
which is caught by the outer try-except
block. While not strictly incorrect, it adds verbosity.
if search_result and search_result.data:
"import os\n", | ||
"\n", | ||
"# Enter your OpenAI API key here\n", | ||
"os.environ['OPENAI_API_KEY'] = 'Enter your api key' # <-- Replace with your OpenAI API key" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding API keys directly in the notebook is generally discouraged, as it can lead to accidental exposure if the notebook is shared or committed to a public repository. While this is a common pattern for Colab notebooks, for better security and user experience, consider using input()
or getpass
to prompt the user for the key, or loading it from a .env
file using python-dotenv
.
os.environ['OPENAI_API_KEY'] = input("Enter your OpenAI API key: ") # <-- Replace with your OpenAI API key
"def internet_search_tool(query: str):\n", | ||
" results = []\n", | ||
" ddgs = DDGS()\n", | ||
" for result in ddgs.text(keywords=query, max_results=5):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new DDGS
instance inside the internet_search_tool
function on every call can be inefficient, especially if this function is called multiple times. For better performance, consider creating the DDGS
instance once (e.g., globally or as a class member if part of a larger class) and reusing it.
ddgs_instance = DDGS()
def internet_search_tool(query: str):
results = []
for result in ddgs_instance.text(keywords=query, max_results=5):
"# Set your API keys here (replace with your actual keys)\n", | ||
"os.environ[\"EXA_API_KEY\"] = \"your api key\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding API keys directly in the notebook is generally discouraged, as it can lead to accidental exposure if the notebook is shared or committed to a public repository. While this is a common pattern for Colab notebooks, for better security and user experience, consider using input()
or getpass
to prompt the user for the key, or loading it from a .env
file using python-dotenv
.
os.environ["EXA_API_KEY"] = input("Enter your Exa API key: ")
os.environ["OPENAI_API_KEY"] = input("Enter your OpenAI API key: ")
" self.input_json = input_json\n", | ||
" def run(self):\n", | ||
" # Dummy validation logic for notebook demo\n", | ||
" data = json.loads(self.input_json)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json.loads(self.input_json)
call in ValidationAgent.run
assumes that self.input_json
will always be a valid JSON string. If self.input_json
is not a string or contains malformed JSON, this will raise a json.JSONDecodeError
at runtime. For a dummy class, this might be acceptable, but in a more robust implementation, a try-except
block should be used to handle potential decoding errors gracefully.
try:
data = json.loads(self.input_json)
except json.JSONDecodeError:
# Handle invalid JSON input, e.g., log error and return False
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
examples/cookbooks/Pocky_Cybersecurity_PoC_Agent.ipynb (2)
137-161
: Excellent prompt design, but not integrated with the validation agent.The YAML prompt is comprehensive and well-structured for PoC validation. However, the
ValidationAgent
class doesn't actually use this prompt.Consider integrating the prompt with the ValidationAgent:
class ValidationAgent: - def __init__(self, input_json): + def __init__(self, input_json, prompt=validation_prompt): self.input_json = input_json + self.prompt = prompt + self.client = OpenAI()
172-231
: Good workflow structure but needs clearer demo labeling.The function demonstrates the intended workflow well, but consider adding clearer indicators that this is a demo implementation.
Add a comment to clarify the demo nature:
def run_pocky_for_cve(cve_id): + # DEMO IMPLEMENTATION - Replace with real CVE data fetching # Example: Simulate fetching a description and PoC (replace with real logic) description = f"Description for {cve_id} (replace with real Exa/OpenAI search)" poc_sample = f"PoC code for {cve_id} (replace with real PoC search)"
examples/cookbooks/MemoryPal_Search_Agent.ipynb (1)
75-96
: Well-implemented search tool with good structure.The
internet_search_tool
function is cleanly implemented with appropriate result limiting and structured data return.Consider adding error handling for network issues:
def internet_search_tool(query: str): + try: results = [] ddgs = DDGS() for result in ddgs.text(keywords=query, max_results=5): results.append({ 'title': result.get('title', ''), 'url': result.get('href', ''), 'snippet': result.get('body', '') }) return results + except Exception as e: + return [{'title': 'Search Error', 'url': '', 'snippet': f'Error performing search: {str(e)}'}]examples/cookbooks/Chile_Government_Services_Assistant.ipynb (3)
40-72
: Comprehensive dependencies with secure API key handling.The package selection covers all necessary functionality. Flask might not be needed for this notebook implementation.
Consider removing flask if not used:
-!pip install flask firecrawl praisonaiagents google-genai python-dotenv deep-translator +!pip install firecrawl praisonaiagents google-genai python-dotenv deep-translator
122-170
: Well-designed FirecrawlTool class with excellent filtering and validation.The class implementation includes proper API key validation, query validation, appropriate URL filtering for Chilean government services, and comprehensive error handling.
Consider adding type hints for better code documentation:
- def search(self, search: str) -> str: + def search(self, search: str) -> str | None:Also consider making the URL filtering more configurable:
+ def __init__(self, api_key, instruction: str, template: str, allowed_domains=None): if not api_key: raise ValueError("Firecrawl API key not provided.") self.app = FirecrawlApp(api_key=api_key) self.instruction = instruction self.template = template + self.allowed_domains = allowed_domains or ["https://www.chileatiende.gob.cl/fichas"]
238-284
: Excellent interactive chat loop with robust translation workflow.The implementation provides a smooth user experience with proper exit handling, translation workflows, and informative error messages.
Consider adding input validation to prevent empty queries:
while True: user_input = input("\nYou: ") + if not user_input.strip(): + continue if user_input.lower() in ["exit", "quit"]: print("Tomás: It was a pleasure to help you. Goodbye!") break
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/cookbooks/Chile_Government_Services_Assistant.ipynb
(1 hunks)examples/cookbooks/MemoryPal_Search_Agent.ipynb
(1 hunks)examples/cookbooks/Pocky_Cybersecurity_PoC_Agent.ipynb
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: test-core (3.11)
- GitHub Check: quick-test
- GitHub Check: Run tests and collect coverage
🔇 Additional comments (11)
examples/cookbooks/Pocky_Cybersecurity_PoC_Agent.ipynb (3)
1-42
: LGTM! Well-structured notebook with clear documentation.The notebook metadata, title, and feature description are well-organized and clearly communicate the tool's purpose.
53-62
: Dependencies look appropriate for the use case.The package selection covers the necessary tools for web scraping, API interactions, and agent functionality.
73-87
: Good security practice using environment variables.Using placeholder strings for demo purposes is appropriate, and the environment variable approach for API keys follows security best practices.
examples/cookbooks/MemoryPal_Search_Agent.ipynb (5)
1-42
: Clear documentation and presentation.The notebook title, description, and Colab integration are well-implemented.
32-42
: Appropriate minimal dependencies.The package selection is focused and includes only necessary components for the search functionality.
52-65
: Secure API key handling for demo purposes.Using environment variables and placeholder text is the appropriate approach for a demo notebook.
106-147
: Well-structured YAML configuration for PraisonAI.The agent configuration properly defines roles, tools, and tasks with appropriate structure and safe parsing.
177-482
: Excellent complete implementation demonstrating PraisonAI integration.The agent creation, tool registration, and execution are properly implemented. The output shows successful tool integration with real search results and formatted responses.
examples/cookbooks/Chile_Government_Services_Assistant.ipynb (3)
1-49
: Clear documentation and professional presentation.The notebook provides good context and clear explanation of the Chilean government services assistant use case.
83-111
: Excellent translation functions with robust error handling.The translation functions include proper error handling, text preprocessing to remove problematic content, and graceful fallback to original text on failure.
181-227
: Clear template structure and proper initialization.The Firecrawl template is well-organized with appropriate placeholders, and the tool initialization correctly uses environment variables.
"cell_type": "code", | ||
"source": [ | ||
"import json\n", | ||
"from openai import OpenAI\n", | ||
"from exa_py import Exa\n", | ||
"\n", | ||
"# Dummy/Minimal agent classes for notebook demo\n", | ||
"class ValidationAgent:\n", | ||
" def __init__(self, input_json):\n", | ||
" self.input_json = input_json\n", | ||
" def run(self):\n", | ||
" # Dummy validation logic for notebook demo\n", | ||
" data = json.loads(self.input_json)\n", | ||
" return True if \"attack_intent\" in data and \"poc_sample\" in data else False\n", | ||
"\n", | ||
"class AttackIntentAgent:\n", | ||
" def __init__(self, description):\n", | ||
" self.description = description\n", | ||
" def run(self):\n", | ||
" # Dummy intent extraction for notebook demo\n", | ||
" return f\"Intent for: {self.description[:50]}...\"" | ||
], | ||
"metadata": { | ||
"id": "GYfAJLXOsbga" | ||
}, | ||
"execution_count": 3, | ||
"outputs": [] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Incomplete implementation - dummy classes don't demonstrate real functionality.
The current agent classes are placeholder implementations that don't utilize the imported OpenAI and Exa libraries. Consider implementing basic functionality to demonstrate the actual workflow.
Example improvement for AttackIntentAgent
:
class AttackIntentAgent:
def __init__(self, description):
self.description = description
+ self.client = OpenAI()
+
def run(self):
- # Dummy intent extraction for notebook demo
- return f"Intent for: {self.description[:50]}..."
+ try:
+ response = self.client.chat.completions.create(
+ model="gpt-3.5-turbo",
+ messages=[
+ {"role": "system", "content": "Extract attack intent from CVE description."},
+ {"role": "user", "content": self.description}
+ ]
+ )
+ return response.choices[0].message.content
+ except Exception as e:
+ return f"Error extracting intent: {e}"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"cell_type": "code", | |
"source": [ | |
"import json\n", | |
"from openai import OpenAI\n", | |
"from exa_py import Exa\n", | |
"\n", | |
"# Dummy/Minimal agent classes for notebook demo\n", | |
"class ValidationAgent:\n", | |
" def __init__(self, input_json):\n", | |
" self.input_json = input_json\n", | |
" def run(self):\n", | |
" # Dummy validation logic for notebook demo\n", | |
" data = json.loads(self.input_json)\n", | |
" return True if \"attack_intent\" in data and \"poc_sample\" in data else False\n", | |
"\n", | |
"class AttackIntentAgent:\n", | |
" def __init__(self, description):\n", | |
" self.description = description\n", | |
" def run(self):\n", | |
" # Dummy intent extraction for notebook demo\n", | |
" return f\"Intent for: {self.description[:50]}...\"" | |
], | |
"metadata": { | |
"id": "GYfAJLXOsbga" | |
}, | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
class AttackIntentAgent: | |
def __init__(self, description): | |
self.description = description | |
self.client = OpenAI() | |
def run(self): | |
try: | |
response = self.client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[ | |
{"role": "system", "content": "Extract attack intent from CVE description."}, | |
{"role": "user", "content": self.description} | |
] | |
) | |
return response.choices[0].message.content | |
except Exception as e: | |
return f"Error extracting intent: {e}" |
🤖 Prompt for AI Agents
In examples/cookbooks/Pocky_Cybersecurity_PoC_Agent.ipynb around lines 98 to
125, the ValidationAgent and AttackIntentAgent classes are currently dummy
placeholders and do not use the imported OpenAI and Exa libraries. To fix this,
implement basic functionality in these classes that leverages OpenAI for intent
extraction and Exa for data processing, demonstrating a real workflow. For
example, modify AttackIntentAgent to call OpenAI's API with the description to
generate an intent, and update ValidationAgent to perform validation using Exa
or OpenAI outputs instead of simple JSON checks.
Summary by CodeRabbit