Skip to content

Commit 87145e6

Browse files
committed
Update the readme
1 parent dadf6c7 commit 87145e6

File tree

2 files changed

+72
-18
lines changed

2 files changed

+72
-18
lines changed

text_2_sql/README.md

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ To solve these issues, a Multi-Shot approach is developed. Below is the iteratio
3939
Three different iterations are presented and code provided for:
4040
- **Iteration 2:** Injection of a brief description of the available entities is injected into the prompt. This limits the number of tokens used and avoids filling the prompt with confusing schema information.
4141
- **Iteration 3:** Indexing the entity definitions in a vector database, such as AI Search, and querying it to retrieve the most relevant entities for the key terms from the query.
42-
- **Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query.
42+
- **Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to - this index is generated by the LLM when it encounters a question that has not been previously asked. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query.
4343

4444
All approaches limit the number of tokens used and avoids filling the prompt with confusing schema information.
4545

@@ -49,6 +49,12 @@ For the query cache enabled approach, AI Search is used as a vector based cache,
4949

5050
### Full Logical Flow for Vector Based Approach
5151

52+
The following diagram shows the logical flow within the Vector Based plugin. In an ideal scenario, the questions will follow the _Pre-Fetched Cache Results Path** which leads to the quickest answer generation. In cases where the question is not known, the plugin will fall back the other paths accordingly.
53+
54+
As the query cache is shared between users (no data is stored in the cache), a new user can benefit from the pre-mapped question and schema resolution in the index.
55+
56+
**Database results were deliberately not stored within the cache. Storing them would have removed one of the key benefits of the Text2SQL plugin, the ability to get near-real time information inside a RAG application. Instead, the query is stored so that the most-recent results can be obtained quickly. Additionally, this retains the ability to apply Row or Column Level Security.**
57+
5258
![Vector Based with Query Cache Logical Flow.](./images/Text2SQL%20Query%20Cache.png "Vector Based with Query Cache Logical Flow")
5359

5460
### Comparison of Iterations
@@ -63,11 +69,12 @@ For the query cache enabled approach, AI Search is used as a vector based cache,
6369
| | Consumes a significant number of tokens as number of entities increases. | As number of entities increases, token usage will grow but at a lesser rate than Iteration 1. | | AI Search adds additional cost to the solution. |
6470
| | LLM struggled to differentiate which table to choose with the large amount of information passed. | | |
6571

66-
### Timing Comparison for Test Question Set
72+
### Complete Execution Time Comparison for Approaches
6773

68-
To compare the different in complete execution time, the following questions were tested 25 times each for 3 different modes.
74+
To compare the different in complete execution time, the following questions were tested 25 times each for 4 different approaches.
6975

70-
Modes:
76+
Approaches:
77+
- Prompt-based Multi-Shot (Iteration 2)
7178
- Vector-Based Multi-Shot (Iteration 3)
7279
- Vector-Based Multi-Shot with Query Cache (Iteration 4)
7380
- Vector-Based Multi-shot with Pre Run Query Cache (Iteration 4)
@@ -77,6 +84,18 @@ Questions:
7784
- Give me the total number of orders in 2008?
7885
- Which country did had the highest number of orders in June 2008?
7986

87+
The graph below shows the response times for the experimentation on a Known Question Set (i.e. the cache has already been populated with the query mapping by the LLM). gpt-4o was used as the completion LLM for this experiment. The response time is the complete execution time including:
88+
89+
- Prompt Preparation
90+
- Question Understanding
91+
- Cache Index Requests _(if applicable)_
92+
- SQL Query Execution
93+
- Interpretation and generation of answer in the correct format
94+
95+
![Response Time Distribution](./images/Known%20Question%20Response%20Time.png "Response Time Distribution By Approach")
96+
97+
The vector-based cache approaches consistently outperform those that just use a Prompt-Based or Vector-Based approach by a significant margin. Given that it is highly likely the same Text2SQL questions will be repeated often, storing the question-sql mapping leads to **significant performance increases** that are beneficial, despite the initial additional latency (between 1 - 2 seconds from testing) when a question is asked the first time.
98+
8099
## Sample Output
81100

82101
### What is the top performing product by quantity of units sold?
@@ -128,7 +147,7 @@ The top-performing product by quantity of units sold is the **Classic Vest, S**
128147
- `./rag_with_vector_based_text_2_sql_query_cache.ipynb` provides example of how to utilise the Vector Based Text2SQL plugin, alongside the query cache, to query the database.
129148
- `./rag_with_ai_search_and_text_2_sql.ipynb` provides an example of how to use the Text2SQL and an AISearch plugin in parallel to automatically retrieve data from the most relevant source to answer the query.
130149
- This setup is useful for a production application as the SQL Database is unlikely to be able to answer all the questions a user may ask.
131-
- `./time_comparison_scripy.py` provides a utility script for performing time based comparisons between the different approaches.
150+
- `./time_comparison_script.py` provides a utility script for performing time based comparisons between the different approaches.
132151

133152
## Data Dictionary
134153

@@ -176,17 +195,11 @@ The data dictionary is stored in `./data_dictionary/entities.json`. Below is a s
176195

177196
A full data dictionary must be built for all the views / tables you which to expose to the LLM. The metadata provide directly influences the accuracy of the Text2SQL component.
178197

179-
## Common Plugin Components
180-
181-
#### run_sql_query()
182-
183-
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
184-
185198
## Prompt Based SQL Plugin (Iteration 2)
186199

187-
This approach works well for a small number of entities (test on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set.
200+
This approach works well for a small number of entities (tested on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set.
188201

189-
Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases.
202+
Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases. Additionally, we found that the LLM started to get "confused" on which columns belong to which entities as the number of entities increased.
190203

191204
### prompt_based_sql_plugin.py
192205

@@ -204,6 +217,10 @@ The **target_engine** is passed to the prompt, along with **engine_specific_rule
204217

205218
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to fetch the full schema definitions for a given entity. This returns a JSON string of the chosen entity which allows the LLM to understand the column definitions and their associated metadata. This can be called in parallel for multiple entities.
206219

220+
#### run_sql_query()
221+
222+
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
223+
207224
## Vector Based SQL Plugin (Iterations 3 & 4)
208225

209226
This approach allows the system to scale without significantly increasing the number of tokens used within the system prompt. Indexing and running an AI Search instance consumes additional cost, compared to the prompt based approach.
@@ -234,9 +251,17 @@ This method is called by the Semantic Kernel framework automatically, when instr
234251

235252
The search text passed is vectorised against the entity level **Description** columns. A hybrid Semantic Reranking search is applied against the **EntityName**, **Entity**, **Columns/Name** fields.
236253

237-
#### run_ai_search_query()
254+
#### fetch_queries_from_cache()
255+
256+
The vector based with query cache uses the `fetch_queries_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
257+
258+
If the score of the top result is higher than the defined threshold, the query will be executed against the target data source and the results included in the prompt. This allows us to prompt the LLM to evaluated whether it can use these results to answer the question, **without further SQL Query generation** to speed up the process.
259+
260+
#### run_sql_query()
261+
262+
This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question.
238263

239-
The vector based with query cache uses the `run_ai_search_query()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
264+
Additionally, if any of the cache functionality is enabled, this method will update the query cache index based on the SQL query run, and the schemas used in execution.
240265

241266
## Tips for good Text2SQL performance.
242267

text_2_sql/plugins/vector_based_sql_plugin/vector_based_sql_plugin.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ def __init__(self, target_engine: str = "Microsoft TSQL Server"):
3434
self.set_mode()
3535

3636
def set_mode(self):
37+
"""Set the mode of the plugin based on the environment variables."""
3738
self.use_query_cache = (
3839
os.environ.get("Text2Sql__UseQueryCache", "False").lower() == "true"
3940
)
@@ -42,7 +43,16 @@ def set_mode(self):
4243
os.environ.get("Text2Sql__PreRunQueryCache", "False").lower() == "true"
4344
)
4445

45-
def filter_schemas_against_statement(self, sql_statement):
46+
def filter_schemas_against_statement(self, sql_statement: str) -> list[dict]:
47+
"""Filter the schemas against the SQL statement to find the matching entities.
48+
49+
Args:
50+
----
51+
sql_statement (str): The SQL statement to filter the schemas against.
52+
53+
Returns:
54+
-------
55+
list[dict]: The list of matching entities."""
4656
matching_entities = []
4757

4858
logging.info("SQL Statement: %s", sql_statement)
@@ -88,7 +98,16 @@ async def query_execution(self, sql_query: str) -> list[dict]:
8898
logging.debug("Results: %s", results)
8999
return results
90100

91-
async def fetch_schemas_from_store(self, search: str):
101+
async def fetch_schemas_from_store(self, search: str) -> list[dict]:
102+
"""Fetch the schemas from the store based on the search term.
103+
104+
Args:
105+
----
106+
search (str): The search term to use to fetch the schemas.
107+
108+
Returns:
109+
-------
110+
list[dict]: The list of schemas fetched from the store."""
92111
schemas = await run_ai_search_query(
93112
search,
94113
["DescriptionEmbedding"],
@@ -107,7 +126,17 @@ async def fetch_schemas_from_store(self, search: str):
107126

108127
return schemas
109128

110-
async def fetch_queries_from_cache(self, question: str):
129+
async def fetch_queries_from_cache(self, question: str) -> str:
130+
"""Fetch the queries from the cache based on the question.
131+
132+
Args:
133+
----
134+
question (str): The question to use to fetch the queries.
135+
136+
Returns:
137+
-------
138+
str: The formatted string of the queries fetched from the cache. This is injected into the prompt.
139+
"""
111140
if not self.use_query_cache:
112141
return None
113142

0 commit comments

Comments
 (0)