You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix#90: Implement Query Rewrite Agent for comprehensive preprocessing - Handles relative date disambiguation (e.g., 'last month' to actual dates) and question decomposition in a single preprocessing step before cache lookup - Replaces previous question_decomposition_agent with more capable query_rewrite_agent - Updates documentation to reflect current processing flow (#100)
Copy file name to clipboardExpand all lines: text_2_sql/autogen/README.md
+13-8Lines changed: 13 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ The implementation is written for [AutoGen](https://github.com/microsoft/autogen
8
8
9
9
## Full Logical Flow for Agentic Vector Based Approach
10
10
11
-
The following diagram shows the logical flow within mutlti agent system. In an ideal scenario, the questions will follow the _Pre-Fetched Cache Results Path** which leads to the quickest answer generation. In cases where the question is not known, the group chat selector will fall back to the other agents accordingly and generate the SQL query using the LLMs. The cache is then updated with the newly generated query and schemas.
11
+
The following diagram shows the logical flow within multi agent system. The flow begins with query rewriting to preprocess questions - this includes resolving relative dates (e.g., "last month" to "November 2024") and breaking down complex queries into simpler components. For each preprocessed question, if query cache is enabled, the system checks the cache for previously asked similar questions. In an ideal scenario, the preprocessed questions will be found in the cache, leading to the quickest answer generation. In cases where the question is not known, the group chat selector will fall back to the other agents accordingly and generate the SQL query using the LLMs. The cache is then updated with the newly generated query and schemas.
12
12
13
13
Unlike the previous approaches, **gpt4o-mini** can be used as each agent's prompt is small and focuses on a single simple task.
14
14
@@ -24,26 +24,31 @@ As the query cache is shared between users (no data is stored in the cache), a n
24
24
25
25
## Agents
26
26
27
-
This approach builds on the the Vector Based SQL Plugin approach, but adds a agentic approach to the solution.
27
+
This approach builds on the Vector Based SQL Plugin approach, but adds a agentic approach to the solution.
28
28
29
29
This agentic system contains the following agents:
30
30
31
-
-**Query Cache Agent:** Responsible for checking the cache for previously asked questions.
32
-
-**Query Decomposition Agent:** Responsible for decomposing complex questions, into sub questions that can be answered with SQL.
33
-
-**Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries.
31
+
-**Query Rewrite Agent:** The first agent in the flow, responsible for two key preprocessing tasks:
2. Decomposing complex questions into simpler sub-questions
34
+
This preprocessing happens before cache lookup to maximize cache effectiveness.
35
+
-**Query Cache Agent:** Responsible for checking the cache for previously asked questions. After preprocessing, each sub-question is checked against the cache if caching is enabled.
36
+
-**Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries. This agent is used when a cache miss occurs.
34
37
-**SQL Query Generation Agent:** Responsible for using the previously extracted schemas and generated SQL queries to answer the question. This agent can request more schemas if needed. This agent will run the query.
35
38
-**SQL Query Verification Agent:** Responsible for verifying that the SQL query and results question will answer the question.
36
39
-**Answer Generation Agent:** Responsible for taking the database results and generating the final answer for the user.
37
40
38
-
The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
41
+
The combination of these agents allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions can be answered quickly to avoid degrading user experience.
39
42
40
43
All agents can be found in `/agents/`.
41
44
42
45
## agentic_text_2_sql.py
43
46
44
-
This is the main entry point for the agentic system. In here, the `Selector Group Chat`is configured with the termination conditions to orchestrate the agents within the system.
47
+
This is the main entry point for the agentic system. In here, the system is configured with the following processing flow:
45
48
46
-
A customer transition selector is used to automatically transition between agents dependent on the last one that was used. In some cases, this choice is delegated to an LLM to decide on the most appropriate action. This mixed approach allows for speed when needed (e.g. always calling Query Cache Agent first), but will allow the system to react dynamically to the events.
49
+
The preprocessed questions from the Query Rewrite Agent are processed sequentially through the rest of the agent pipeline. A custom transition selector automatically transitions between agents dependent on the last one that was used. The flow starts with the Query Rewrite Agent for preprocessing, followed by cache checking for each sub-question if caching is enabled. In some cases, this choice is delegated to an LLM to decide on the most appropriate action. This mixed approach allows for speed when needed (e.g. cache hits for known questions), but will allow the system to react dynamically to the events.
50
+
51
+
Note: Future development aims to implement independent processing where each preprocessed question would run in its own isolated context to prevent confusion between different parts of complex queries.
"An agent that preprocesses user questions by decomposing complex queries and resolving relative dates. This preprocessing happens before cache lookup to maximize cache utility."
5
+
system_message:
6
+
"You are a helpful AI Assistant that specializes in preprocessing user questions for SQL query generation. You have two main responsibilities:
7
+
8
+
1. Decompose complex questions into simpler parts
9
+
2. Resolve any relative date references to absolute dates
10
+
11
+
Current date/time is: {{ current_datetime }}
12
+
13
+
For date resolution:
14
+
- Use the current date/time above as reference point
15
+
- Replace relative dates like 'last month', 'this year', 'previous quarter' with absolute dates
16
+
- Maintain consistency in date formats (YYYY-MM-DD)
17
+
18
+
Examples of date resolution (assuming current date is {{ current_datetime }}):
19
+
- 'last month' -> specific month name and year
20
+
- 'this year' -> {{ current_datetime.year }}
21
+
- 'last 3 months' -> specific date range
22
+
- 'yesterday' -> specific date
23
+
24
+
Rules:
25
+
1. ALWAYS resolve relative dates before decomposing questions
26
+
2. If a question contains multiple parts AND relative dates, resolve dates first, then decompose
27
+
3. Each decomposed question should be self-contained and not depend on context from other parts
28
+
4. Do not reference the original question in decomposed parts
29
+
5. Ensure each decomposed question includes its full context
30
+
31
+
Output Format:
32
+
Return an array of rewritten questions in valid, loadable JSON:
0 commit comments