You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: text_2_sql/README.md
+40-15Lines changed: 40 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ To solve these issues, a Multi-Shot approach is developed. Below is the iteratio
39
39
Three different iterations are presented and code provided for:
40
40
-**Iteration 2:** Injection of a brief description of the available entities is injected into the prompt. This limits the number of tokens used and avoids filling the prompt with confusing schema information.
41
41
-**Iteration 3:** Indexing the entity definitions in a vector database, such as AI Search, and querying it to retrieve the most relevant entities for the key terms from the query.
42
-
-**Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query.
42
+
-**Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to - this index is generated by the LLM when it encounters a question that has not been previously asked. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query.
43
43
44
44
All approaches limit the number of tokens used and avoids filling the prompt with confusing schema information.
45
45
@@ -49,6 +49,12 @@ For the query cache enabled approach, AI Search is used as a vector based cache,
49
49
50
50
### Full Logical Flow for Vector Based Approach
51
51
52
+
The following diagram shows the logical flow within the Vector Based plugin. In an ideal scenario, the questions will follow the _Pre-Fetched Cache Results Path** which leads to the quickest answer generation. In cases where the question is not known, the plugin will fall back the other paths accordingly.
53
+
54
+
As the query cache is shared between users (no data is stored in the cache), a new user can benefit from the pre-mapped question and schema resolution in the index.
55
+
56
+
**Database results were deliberately not stored within the cache. Storing them would have removed one of the key benefits of the Text2SQL plugin, the ability to get near-real time information inside a RAG application. Instead, the query is stored so that the most-recent results can be obtained quickly. Additionally, this retains the ability to apply Row or Column Level Security.**
57
+
52
58

53
59
54
60
### Comparison of Iterations
@@ -63,11 +69,12 @@ For the query cache enabled approach, AI Search is used as a vector based cache,
63
69
|| Consumes a significant number of tokens as number of entities increases. | As number of entities increases, token usage will grow but at a lesser rate than Iteration 1. || AI Search adds additional cost to the solution. |
64
70
|| LLM struggled to differentiate which table to choose with the large amount of information passed. |||
65
71
66
-
### Timing Comparison for Test Question Set
72
+
### Complete Execution Time Comparison for Approaches
67
73
68
-
To compare the different in complete execution time, the following questions were tested 25 times each for 3 different modes.
74
+
To compare the different in complete execution time, the following questions were tested 25 times each for 4 different approaches.
69
75
70
-
Modes:
76
+
Approaches:
77
+
- Prompt-based Multi-Shot (Iteration 2)
71
78
- Vector-Based Multi-Shot (Iteration 3)
72
79
- Vector-Based Multi-Shot with Query Cache (Iteration 4)
73
80
- Vector-Based Multi-shot with Pre Run Query Cache (Iteration 4)
@@ -77,6 +84,18 @@ Questions:
77
84
- Give me the total number of orders in 2008?
78
85
- Which country did had the highest number of orders in June 2008?
79
86
87
+
The graph below shows the response times for the experimentation on a Known Question Set (i.e. the cache has already been populated with the query mapping by the LLM). gpt-4o was used as the completion LLM for this experiment. The response time is the complete execution time including:
88
+
89
+
- Prompt Preparation
90
+
- Question Understanding
91
+
- Cache Index Requests _(if applicable)_
92
+
- SQL Query Execution
93
+
- Interpretation and generation of answer in the correct format
94
+
95
+

96
+
97
+
The vector-based cache approaches consistently outperform those that just use a Prompt-Based or Vector-Based approach by a significant margin. Given that it is highly likely the same Text2SQL questions will be repeated often, storing the question-sql mapping leads to **significant performance increases** that are beneficial, despite the initial additional latency (between 1 - 2 seconds from testing) when a question is asked the first time.
98
+
80
99
## Sample Output
81
100
82
101
### What is the top performing product by quantity of units sold?
@@ -128,7 +147,7 @@ The top-performing product by quantity of units sold is the **Classic Vest, S**
128
147
-`./rag_with_vector_based_text_2_sql_query_cache.ipynb` provides example of how to utilise the Vector Based Text2SQL plugin, alongside the query cache, to query the database.
129
148
-`./rag_with_ai_search_and_text_2_sql.ipynb` provides an example of how to use the Text2SQL and an AISearch plugin in parallel to automatically retrieve data from the most relevant source to answer the query.
130
149
- This setup is useful for a production application as the SQL Database is unlikely to be able to answer all the questions a user may ask.
131
-
-`./time_comparison_scripy.py` provides a utility script for performing time based comparisons between the different approaches.
150
+
-`./time_comparison_script.py` provides a utility script for performing time based comparisons between the different approaches.
132
151
133
152
## Data Dictionary
134
153
@@ -176,17 +195,11 @@ The data dictionary is stored in `./data_dictionary/entities.json`. Below is a s
176
195
177
196
A full data dictionary must be built for all the views / tables you which to expose to the LLM. The metadata provide directly influences the accuracy of the Text2SQL component.
178
197
179
-
## Common Plugin Components
180
-
181
-
#### run_sql_query()
182
-
183
-
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
184
-
185
198
## Prompt Based SQL Plugin (Iteration 2)
186
199
187
-
This approach works well for a small number of entities (test on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set.
200
+
This approach works well for a small number of entities (tested on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set.
188
201
189
-
Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases.
202
+
Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases. Additionally, we found that the LLM started to get "confused" on which columns belong to which entities as the number of entities increased.
190
203
191
204
### prompt_based_sql_plugin.py
192
205
@@ -204,6 +217,10 @@ The **target_engine** is passed to the prompt, along with **engine_specific_rule
204
217
205
218
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to fetch the full schema definitions for a given entity. This returns a JSON string of the chosen entity which allows the LLM to understand the column definitions and their associated metadata. This can be called in parallel for multiple entities.
206
219
220
+
#### run_sql_query()
221
+
222
+
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
223
+
207
224
## Vector Based SQL Plugin (Iterations 3 & 4)
208
225
209
226
This approach allows the system to scale without significantly increasing the number of tokens used within the system prompt. Indexing and running an AI Search instance consumes additional cost, compared to the prompt based approach.
@@ -234,9 +251,17 @@ This method is called by the Semantic Kernel framework automatically, when instr
234
251
235
252
The search text passed is vectorised against the entity level **Description** columns. A hybrid Semantic Reranking search is applied against the **EntityName**, **Entity**, **Columns/Name** fields.
236
253
237
-
#### run_ai_search_query()
254
+
#### fetch_queries_from_cache()
255
+
256
+
The vector based with query cache uses the `fetch_queries_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
257
+
258
+
If the score of the top result is higher than the defined threshold, the query will be executed against the target data source and the results included in the prompt. This allows us to prompt the LLM to evaluated whether it can use these results to answer the question, **without further SQL Query generation** to speed up the process.
259
+
260
+
#### run_sql_query()
261
+
262
+
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
238
263
239
-
The vector based with query cache uses the `run_ai_search_query()`method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
264
+
Additionally, if any of the cache functionality is enabled, this method will update the query cache index based on the SQL query run, and the schemas used in execution.
0 commit comments