Update README

BenConstable9 · BenConstable9 · commit 84bb42ef587c · 2025-01-13T17:29:00.000Z
diff --git a/adi_function_app/GETTING_STARTED.md b/adi_function_app/GETTING_STARTED.md
@@ -0,0 +1,10 @@
+# Getting Started with Document Intelligence Function App
+
+To get started, perform the following steps:
+
+1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, an Python Function App, AI Search and a storage account.
+2. Clone this repository and deploy the AI Search rag documents indexes from `deploy_ai_search`.
+3. Run `uv sync` within the adi_function_app directory to install dependencies.
+4. Configure the environment variables of the function app based on the provided sample
+5. Package your Azure Function and upload to your Function App
+6. Upload a document for indexing or send a direct HTTP request to the Azure Function.
diff --git a/adi_function_app/README.md b/adi_function_app/README.md
@@ -42,6 +42,9 @@ The properties returned from the ADI Custom Skill and Chunking are then used to
 - Keyphrase extraction
 - Vectorisation
 
+> [!NOTE]
+> See `GETTING_STARTED.md` for a step by step guide of how to use the accelerator.
+
 ## Sample Output
 
 Using the [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/pdf/2404.14219) as an example, the following output can be obtained for page 7:
diff --git a/text_2_sql/GETTING_STARTED.md b/text_2_sql/GETTING_STARTED.md
@@ -0,0 +1,11 @@
+# Getting Started with Agentic Text2SQL Component
+
+To get started, perform the following steps:
+
+1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, alongside a SQL Server sample database, AI Search and a storage account.
+2. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`.
+3. Run `uv sync` within the text_2_sql directory to install dependencies.
+4. Configure the .env file based on the provided sample
+5. Generate a data dictionary for your target server using the instructions in `data_dictionary`.
+6. Upload these data dictionaries to the relevant contains in your storage account. Wait for them to be automatically indexed.
+7. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started.
diff --git a/text_2_sql/README.md b/text_2_sql/README.md
@@ -4,7 +4,7 @@ This portion of the repo contains code to implement a multi-shot approach to Tex
 
 The sample provided works with Azure SQL Server, although it has been easily adapted to other SQL sources such as Snowflake.
 
-> [!IMPORTANT]
+> [!NOTE]
 >
 > - Previous versions of this approach have now been moved to `previous_iterations/semantic_kernel`. These will not be updated.
 
@@ -14,6 +14,9 @@ The following diagram shows a workflow for how the Text2SQL plugin would be inco
 
 ![High level workflow for a plugin driven RAG application](../images/Plugin%20Based%20RAG%20Flow.png "High Level Workflow")
 
+> [!NOTE]
+> See `GETTING_STARTED.md` for a step by step guide of how to use the accelerator.
+
 ## Why Text2SQL instead of indexing the database contents?
 
 Generating SQL queries and executing them to provide context for the RAG application provided several benefits in the use case this was designed for.
@@ -57,6 +60,10 @@ As the query cache is shared between users (no data is stored in the cache), a n
 
 ![Vector Based with Query Cache Logical Flow.](./images/Agentic%20Text2SQL%20Query%20Cache.png "Agentic Vector Based with Query Cache Logical Flow")
 
+#### Parallel execution
+
+After the first agent has rewritten and decomposed the user input, we execute each of the individual questions in parallel for the quickest time to generate an answer.
+
 ### Caching Strategy
 
 The cache strategy implementation is a simple way to prove that the system works. You can adopt several different strategies for cache population. Below are some of the strategies that could be used:
@@ -68,6 +75,10 @@ The cache strategy implementation is a simple way to prove that the system works
 
 ## Sample Output
 
+> [!NOTE]
+>
+> - Full payloads for input / outputs can be found in `text_2_sql_core/src/text_2_sql_core/payloads/interaction_payloads.py`.
+
 ### What is the top performing product by quantity of units sold?
 
 #### SQL Query Generated
@@ -81,14 +92,12 @@ The cache strategy implementation is a simple way to prove that the system works
     "answer": "The top-performing product by quantity of units sold is the **Classic Vest, S** from the **Classic Vest** product model, with a total of 87 units sold [1][2].",
     "sources": [
         {
-            "title": "Sales Order Detail",
-            "chunk": "| ProductID | TotalUnitsSold |\n|-----------|----------------|\n| 864       | 87             |\n",
-            "reference": "SELECT TOP 1 ProductID, SUM(OrderQty) AS TotalUnitsSold FROM SalesLT.SalesOrderDetail GROUP BY ProductID ORDER BY TotalUnitsSold DESC;"
+            "sql_rows": "| ProductID | TotalUnitsSold |\n|-----------|----------------|\n| 864       | 87             |\n",
+            "sql_query": "SELECT TOP 1 ProductID, SUM(OrderQty) AS TotalUnitsSold FROM SalesLT.SalesOrderDetail GROUP BY ProductID ORDER BY TotalUnitsSold DESC;"
         },
         {
-            "title": "Product and Description",
-            "chunk": "| Name           | ProductModel  |\n|----------------|---------------|\n| Classic Vest, S| Classic Vest  |\n",
-            "reference": "SELECT Name, ProductModel FROM SalesLT.vProductAndDescription WHERE ProductID = 864;"
+            "sql_rows": "| Name           | ProductModel  |\n|----------------|---------------|\n| Classic Vest, S| Classic Vest  |\n",
+            "sql_query": "SELECT Name, ProductModel FROM SalesLT.vProductAndDescription WHERE ProductID = 864;"
         }
     ]
 }
@@ -110,6 +119,10 @@ The top-performing product by quantity of units sold is the **Classic Vest, S**
 |----------------|---------------|
 | Classic Vest, S| Classic Vest  |
 
+## Disambiguation Requests
+
+If the LLM is unable to understand or answer the question asked, it can ask the user follow up questions with a DisambiguationRequest. In cases where multiple columns may be the correct one, or that there user may be referring to several different filter values, the LLM can produce a series of options for the end user to select from.
+
 ## Data Dictionary
 
 ### entities.json
diff --git a/text_2_sql/autogen/README.md b/text_2_sql/autogen/README.md
@@ -163,8 +163,7 @@ The system produces standardized JSON output through the Answer and Sources Agen
   "sources": [
     {
       "sql_query": "The SQL query used",
-      "sql_rows": ["Array of result rows"],
-      "markdown_table": "Formatted markdown table of results"
+      "sql_rows": ["Array of result rows"]
     }
   ]
 }
diff --git a/text_2_sql/text_2_sql_core/README.md b/text_2_sql/text_2_sql_core/README.md
@@ -0,0 +1,3 @@
+## Text2SQL Core
+
+This portion of the repository contains the core prompts, code and config used to power the text2sql agentic flow. As much of the code as possible is kept separate from the AutoGen implementation to enable it to be easily rewritten for another framework in the future.

Original file line number	Diff line number	Diff line change
`@@ -163,8 +163,7 @@ The system produces standardized JSON output through the Answer and Sources Agen`
`163`	`163`	`"sources": [`
`164`	`164`	`{`
`165`	`165`	`"sql_query": "The SQL query used",`
`166`		`- "sql_rows": ["Array of result rows"],`
`167`		`- "markdown_table": "Formatted markdown table of results"`
	`166`	`+ "sql_rows": ["Array of result rows"]`
`168`	`167`	`}`
`169`	`168`	`]`
`170`	`169`	`}`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## Text2SQL Core`
	`2`	`+`
	`3`	`+This portion of the repository contains the core prompts, code and config used to power the text2sql agentic flow. As much of the code as possible is kept separate from the AutoGen implementation to enable it to be easily rewritten for another framework in the future.`