-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: frontend refactor + DR #5225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits
and generate_final_answer | ||
and response_part.answer_piece | ||
): | ||
if chat_message_id is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checked outside the for loop
) | ||
|
||
if ( | ||
hasattr(response_part, "answer_piece") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should check isinstance(response_part, (OnyxAnswerPiece, AgentAnswerPiece, ...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
writer, | ||
) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment explaining what this case is used for
write_custom_event( | ||
ind, | ||
MessageDelta( | ||
content=response_part.answer_piece, type="message_delta" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message_delta should be a global constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed all message_delta as they are defaults of the class anyway with only one allowed value
f"{tool_name}: {tool.cost}" for tool_name, tool in available_tools.items() | ||
) | ||
|
||
tool_differentiations: list[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turn this into a list comp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
focussing on providing the citations and providing some answer facts. But the \ | ||
main content should be in the cited documents for each sub-question. | ||
- Pay close attention to whether the sub-answers mention whether the topic of interest \ | ||
was explicitly mentioned! If not you cannot reliably use that information to construct your answer, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not you -> delete "not"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
FINAL_ANSWER_PROMPT_WITHOUT_SUB_ANSWERS = PromptTemplate( | ||
f""" | ||
You are great at answering a user question based \ | ||
a list of documents that were retrieved in response to subh-questions, and possibly also \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subh -> sub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
they MUST NOT be part of the rewritten search query... take it out in that case! \ | ||
Particularly look for expressions like '...in our Google docs...', '...in our \ | ||
Google calls', etc., in which case the source type is 'google_drive' or 'gong' \ | ||
should not be included in the rewritten query! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "which" or "and" to the start of the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed wording a bit, though differently!
You are great at 1) determining whether a question can be answered \ | ||
by you directly using your knowledge alone and the chat history (if any), and 2) actually \ | ||
answering the question/request, \ | ||
if the request DOES NOT require or would strongly benefit from ANY external tool \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or -> nor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
""" | ||
|
||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be commented out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40 issues found across 297 files
Note: This PR contains a large number of files. cubic only reviews up to 150 files per PR, so some files may not have been reviewed.
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
return ( | ||
"...\n" | ||
if len(chat_history) > len(past_messages) | ||
else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing concatenation in ternary return causes syntax error or drops chat history content.
Prompt for AI agents
Address the following comment on backend/onyx/agents/agent_search/dr/utils.py at line 192:
<comment>Missing concatenation in ternary return causes syntax error or drops chat history content.</comment>
<file context>
@@ -0,0 +1,333 @@
+import re
+
+from langchain.schema.messages import BaseMessage
+from langchain.schema.messages import HumanMessage
+from sqlalchemy.orm import Session
+
+from onyx.agents.agent_search.dr.enums import ResearchAnswerPurpose
+from onyx.agents.agent_search.dr.enums import ResearchType
+from onyx.agents.agent_search.dr.models import AggregatedDRContext
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
claims, | ||
) = extract_document_citations(answer_string, claims) | ||
cited_documents = { | ||
citation_number: retrieved_docs[citation_number - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out-of-bounds citation indexing may raise IndexError in DEEP path
Prompt for AI agents
Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_2_act.py at line 223:
<comment>Out-of-bounds citation indexing may raise IndexError in DEEP path</comment>
<file context>
@@ -0,0 +1,258 @@
+import re
+from datetime import datetime
+from typing import cast
+
+from langchain_core.runnables import RunnableConfig
+from langgraph.types import StreamWriter
+
+from onyx.agents.agent_search.dr.enums import ResearchType
+from onyx.agents.agent_search.dr.models import BaseSearchProcessingResponse
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if not state.available_tools: | ||
raise ValueError("available_tools is not set") | ||
|
||
search_tool_info = state.available_tools[state.tools_used[-1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError/KeyError accessing last used tool without guards
Prompt for AI agents
Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_2_act.py at line 69:
<comment>Potential IndexError/KeyError accessing last used tool without guards</comment>
<file context>
@@ -0,0 +1,258 @@
+import re
+from datetime import datetime
+from typing import cast
+
+from langchain_core.runnables import RunnableConfig
+from langgraph.types import StreamWriter
+
+from onyx.agents.agent_search.dr.enums import ResearchType
+from onyx.agents.agent_search.dr.models import BaseSearchProcessingResponse
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from onyx.db.persona import upsert_persona | ||
from onyx.db.prompts import get_default_prompt | ||
from onyx.tools.built_in_tools import get_search_tool | ||
from onyx.tools.built_in_tools import get_builtin_tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect SQLAlchemy filter uses is
(Python identity) instead of SQL expression, making the default-exists check always false.
Prompt for AI agents
Address the following comment on backend/onyx/db/slack_channel_config.py at line 19:
<comment>Incorrect SQLAlchemy filter uses `is` (Python identity) instead of SQL expression, making the default-exists check always false.</comment>
<file context>
@@ -16,7 +16,8 @@
from onyx.db.persona import mark_persona_as_deleted
from onyx.db.persona import upsert_persona
from onyx.db.prompts import get_default_prompt
-from onyx.tools.built_in_tools import get_search_tool
+from onyx.tools.built_in_tools import get_builtin_tool
+from onyx.tools.tool_implementations.search.search_tool import SearchTool
from onyx.utils.errors import EERequiredError
</file context>
llm: LLM, | ||
force_run: bool = False, | ||
) -> dict[str, Any] | None: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KnowledgeGraphTool is registered as built-in but all execution methods raise ValueError; if surfaced, selecting or evaluating it will crash both tool-calling and non-tool-calling flows.
Prompt for AI agents
Address the following comment on backend/onyx/tools/tool_implementations/knowledge_graph/knowledge_graph_tool.py at line 69:
<comment>KnowledgeGraphTool is registered as built-in but all execution methods raise ValueError; if surfaced, selecting or evaluating it will crash both tool-calling and non-tool-calling flows.</comment>
<file context>
@@ -0,0 +1,106 @@
+from collections.abc import Generator
+from typing import Any
+
+from onyx.chat.prompt_builder.answer_prompt_builder import AnswerPromptBuilder
+from onyx.llm.interfaces import LLM
+from onyx.llm.models import PreviousMessage
+from onyx.tools.message import ToolCallSummary
+from onyx.tools.models import ToolResponse
+from onyx.tools.tool import Tool
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave for now
# TODO: extra review regarding coding style | ||
@property | ||
def llm_name(self) -> str: | ||
return self.display_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm_name returns display_name, which is not guaranteed unique and can collide when used as a dict key; prefer a stable unique identifier (e.g., name) to avoid overwriting tools.
Prompt for AI agents
Address the following comment on backend/onyx/tools/tool.py at line 48:
<comment>llm_name returns display_name, which is not guaranteed unique and can collide when used as a dict key; prefer a stable unique identifier (e.g., name) to avoid overwriting tools.</comment>
<file context>
@@ -35,6 +40,13 @@ def description(self) -> str:
def display_name(self) -> str:
raise NotImplementedError
+ # Added to make tools work better with LLMs in prompts. Should be unique
+ # TODO: looks at ways how to best ensure uniqueness.
+ # TODO: extra review regarding coding style
+ @property
+ def llm_name(self) -> str:
+ return self.display_name
</file context>
return self.display_name | |
return self.name |
graph.add_node(DRPath.ORCHESTRATOR, orchestrator) | ||
|
||
basic_search_graph = dr_basic_search_graph_builder().compile() | ||
graph.add_node(DRPath.INTERNAL_SEARCH, basic_search_graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiled subgraphs with SubAgentMainState are added as nodes to a parent graph with MainState, causing state schema mismatch at runtime
Prompt for AI agents
Address the following comment on backend/onyx/agents/agent_search/dr/graph_builder.py at line 50:
<comment>Compiled subgraphs with SubAgentMainState are added as nodes to a parent graph with MainState, causing state schema mismatch at runtime</comment>
<file context>
@@ -0,0 +1,88 @@
+from langgraph.graph import END
+from langgraph.graph import START
+from langgraph.graph import StateGraph
+
+from onyx.agents.agent_search.dr.conditional_edges import completeness_router
+from onyx.agents.agent_search.dr.conditional_edges import decision_router
+from onyx.agents.agent_search.dr.enums import DRPath
+from onyx.agents.agent_search.dr.nodes.dr_a0_clarification import clarifier
+from onyx.agents.agent_search.dr.nodes.dr_a1_orchestrator import orchestrator
</file context>
sa.ForeignKey("chat_message.id", ondelete="CASCADE"), | ||
nullable=False, | ||
), | ||
sa.Column("iteration_nr", sa.Integer(), nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-steps are not linked to a specific iteration via a foreign key, allowing inconsistent data. Add an iteration_id FK (or a composite FK using a unique constraint) to enforce referential integrity.
Prompt for AI agents
Address the following comment on backend/alembic/versions/5ae8240accb3_add_research_agent_database_tables_and_.py at line 42:
<comment>Sub-steps are not linked to a specific iteration via a foreign key, allowing inconsistent data. Add an iteration_id FK (or a composite FK using a unique constraint) to enforce referential integrity.</comment>
<file context>
@@ -0,0 +1,102 @@
+"""add research agent database tables and chat message research fields
+
+Revision ID: 5ae8240accb3
+Revises: b558f51620b4
+Create Date: 2025-08-06 14:29:24.691388
+
+"""
+
+from alembic import op
</file context>
@@ -395,19 +298,14 @@ export function ChatInputBar({ | |||
|
|||
const handleKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => { | |||
if ( | |||
((showSuggestions && assistantTagOptions.length > 0) || showPrompts) && | |||
(showSuggestions || showPrompts) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enter/Tab handling intercepts keystrokes when showSuggestions is true but no assistant suggestions are rendered, blocking sends after typing a trailing @mention.
(Based on your team's feedback about keeping UX regressions out of refactors and ensuring removed features don't leave dead-end UI states.)
Prompt for AI agents
Address the following comment on web/src/app/chat/components/input/ChatInputBar.tsx at line 301:
<comment>Enter/Tab handling intercepts keystrokes when showSuggestions is true but no assistant suggestions are rendered, blocking sends after typing a trailing @mention.
(Based on your team's feedback about keeping UX regressions out of refactors and ensuring removed features don't leave dead-end UI states.)</comment>
<file context>
@@ -395,19 +298,14 @@ export function ChatInputBar({
const handleKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => {
if (
- ((showSuggestions && assistantTagOptions.length > 0) || showPrompts) &&
+ (showSuggestions || showPrompts) &&
(e.key === "Tab" || e.key == "Enter")
) {
</file context>
(showSuggestions || showPrompts) && | |
showPrompts && |
GENERAL_DR_ANSWER_PROMPT = PromptTemplate( | ||
f"""\ | ||
Below you see a user question and potentially an earlier chat history that can be referred to \ | ||
for context. Also, today is {datetime.now().strftime("%Y-%m-%d")}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current date is interpolated at import time; should be provided at render time to avoid stale values.
Prompt for AI agents
Address the following comment on backend/onyx/prompts/dr_prompts.py at line 1202:
<comment>Current date is interpolated at import time; should be provided at render time to avoid stale values.</comment>
<file context>
@@ -0,0 +1,1378 @@
+from datetime import datetime
+
+from onyx.agents.agent_search.dr.constants import MAX_DR_PARALLEL_SEARCH
+from onyx.agents.agent_search.dr.enums import DRPath
+from onyx.agents.agent_search.dr.enums import ResearchType
+from onyx.prompts.prompt_template import PromptTemplate
+
+
+# Standards
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* squash: combine all DR commits into one Co-authored-by: Joachim Rahmfeld <joachim@onyx.app> Co-authored-by: Rei Meguro <rmeguro@umich.edu> * Fixes * show KG in Assistant only if available * KG only usable for KG Beta (for now) * base file upload * raise error if uploaded context is too long * improvements * More improvements * Fix citations * better decision making * improved decision-making in Orchestrator * generic_internal tools * Small tweak * tool use improvements * add on * More image gen stuff * fixes * Small color improvements * Markdown utils * fixed end conditions (incl early exit for image generation) * remove agent search + image fixes * Okta tool support for reload * Some cleanup * Stream back search tool results as they come * tool forcing * fixed no-Tool-Assistant * Support anthropic tool calling * Support anthropic models better * More stuff * prompt fixes and search step numbers * Fix hook ordering issue * internal search fix * Improve citation look * Small UI improvements * Improvements * Improve dot * Small chat fixes * Small UI tweaks * Small improvements * Remove un-used code * Fix * Remove test_answer.py for now * Fix * improvements * Add foreign keys * early forcing * Fix tests * Fix tests --------- Co-authored-by: Joachim Rahmfeld <joachim@onyx.app> Co-authored-by: Rei Meguro <rmeguro@umich.edu> Co-authored-by: joachim-danswer <joachim@danswer.ai>
Description
Fixes https://linear.app/danswer/issue/DAN-2269/deep-research-ui
Fixes https://linear.app/danswer/issue/DAN-2270/deep-research-backend
How Has This Been Tested?
Tested locally
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Adds Deep Research v2 with a clarifier→orchestrator→closer workflow and packetized streaming, plus DB support for research iterations and answer purpose. Updates KB search, tools, and chat piping to the new streaming model.
New Features
Migration