Skip to content

Commit 03e8d4b

Browse files
Add example env file for text2sql (#142)
1 parent 986143c commit 03e8d4b

File tree

6 files changed

+74
-52
lines changed

6 files changed

+74
-52
lines changed

text_2_sql/.env.example

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Environment variables for Text2SQL
2+
IdentityType=<identityType> # system_assigned or user_assigned or key
3+
4+
# Open AI Connection Details
5+
OpenAI__CompletionDeployment=<openAICompletionDeploymentId. Used for data dictionary creator>
6+
OpenAI__MiniCompletionDeployment=<OpenAI__MiniCompletionDeploymentId. Used for agentic text2sql>
7+
OpenAI__Endpoint=<openAIEndpoint>
8+
OpenAI__ApiKey=<openAIKey if using non identity based connection>
9+
OpenAI__ApiVersion=<openAIApiVersion>
10+
11+
# Azure AI Search Connection Details
12+
AIService__AzureSearchOptions__Endpoint=<AI search endpoint>
13+
AIService__AzureSearchOptions__Key=<AI search key if using non identity based connection>
14+
AIService__AzureSearchOptions__Text2SqlSchemaStore__Index=<Schema store index name. Default is created as "text-2-sql-schema-store-index">
15+
AIService__AzureSearchOptions__Text2SqlSchemaStore__SemanticConfig=<Schema store semantic config. Default is created as "text-2-sql-schema-store-semantic-config">
16+
AIService__AzureSearchOptions__Text2SqlQueryCache__Index=<Query cache index name. Default is created as "text-2-sql-query-cache-index">
17+
AIService__AzureSearchOptions__Text2SqlQueryCache__SemanticConfig=<Query cache semantic config. Default is created as "text-2-sql-query-cache-semantic-config">
18+
AIService__AzureSearchOptions__Text2SqlColumnValueStore__Index=<Column value store index name. Default is created as "text-2-sql-column-value-store-index">
19+
20+
# All SQL Engine specific connection details
21+
Text2Sql__DatabaseName=<databaseName>
22+
23+
# TSQL or PostgreSQL Specific Connection Details
24+
Text2Sql__DatabaseConnectionString=<databaseConnectionString>
25+
26+
# Snowflake Specific Connection Details
27+
Text2Sql__Snowflake__User=<snowflakeUser if using Snowflake Data Source>
28+
Text2Sql__Snowflake__Password=<snowflakePassword if using Snowflake Data Source>
29+
Text2Sql__Snowflake__Account=<snowflakeAccount if using Snowflake Data Source>
30+
Text2Sql__Snowflake__Warehouse=<snowflakeWarehouse if using Snowflake Data Source>
31+
32+
# Databricks Specific Connection Details
33+
Text2Sql__Databricks__Catalog=<databricksCatalog if using Databricks Data Source with Unity Catalog>
34+
Text2Sql__Databricks__ServerHostname=<databricksServerHostname if using Databricks Data Source with Unity Catalog>
35+
Text2Sql__Databricks__HttpPath=<databricksHttpPath if using Databricks Data Source with Unity Catalog>
36+
Text2Sql__Databricks__AccessToken=<databricks AccessToken if using Databricks Data Source with Unity Catalog>

text_2_sql/GETTING_STARTED.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ To get started, perform the following steps:
55
1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, alongside a SQL Server sample database, AI Search and a storage account.
66
2. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`.
77
3. Run `uv sync` within the text_2_sql directory to install dependencies.
8-
4. Configure the .env file based on the provided sample
9-
5. Generate a data dictionary for your target server using the instructions in `data_dictionary`.
10-
6. Upload these data dictionaries to the relevant contains in your storage account. Wait for them to be automatically indexed.
8+
4. Create your `.env` file based on the provided sample `.env.example`. Place this file in the same place as the `.env.example`.
9+
5. Generate a data dictionary for your target server using the instructions in the **Running** section of the `data_dictionary/README.md`.
10+
6. Upload these data dictionaries to the relevant containers in your storage account. Wait for them to be automatically indexed with the included skillsets.
1111
7. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started.

text_2_sql/README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,20 @@ As the query cache is shared between users (no data is stored in the cache), a n
5454

5555
![Vector Based with Query Cache Logical Flow.](./images/Agentic%20Text2SQL%20Query%20Cache.png "Agentic Vector Based with Query Cache Logical Flow")
5656

57-
#### Parallel execution
57+
## Agents
58+
59+
This agentic system contains the following agents:
60+
61+
- **Query Cache Agent:** Responsible for checking the cache for previously asked questions.
62+
- **Query Decomposition Agent:** Responsible for decomposing complex questions, into sub questions that can be answered with SQL.
63+
- **Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries.
64+
- **SQL Query Generation Agent:** Responsible for using the previously extracted schemas and generated SQL queries to answer the question. This agent can request more schemas if needed. This agent will run the query.
65+
- **SQL Query Verification Agent:** Responsible for verifying that the SQL query and results question will answer the question.
66+
- **Answer Generation Agent:** Responsible for taking the database results and generating the final answer for the user.
67+
68+
The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
69+
70+
### Parallel execution
5871

5972
After the first agent has rewritten and decomposed the user input, we execute each of the individual questions in parallel for the quickest time to generate an answer.
6073

@@ -189,22 +202,9 @@ Below is a sample entry for a view / table that we which to expose to the LLM. T
189202
}
190203
```
191204

192-
See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**.
193-
194-
## Agentic Vector Based Approach (Iteration 5)
195-
196-
This approach builds on the the Vector Based SQL Plugin approach that was previously developed, but adds a agentic approach to the solution.
197-
198-
This agentic system contains the following agents:
199-
200-
- **Query Cache Agent:** Responsible for checking the cache for previously asked questions.
201-
- **Query Decomposition Agent:** Responsible for decomposing complex questions, into sub questions that can be answered with SQL.
202-
- **Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries.
203-
- **SQL Query Generation Agent:** Responsible for using the previously extracted schemas and generated SQL queries to answer the question. This agent can request more schemas if needed. This agent will run the query.
204-
- **SQL Query Verification Agent:** Responsible for verifying that the SQL query and results question will answer the question.
205-
- **Answer Generation Agent:** Responsible for taking the database results and generating the final answer for the user.
206-
207-
The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
205+
> [!NOTE]
206+
>
207+
> - See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**.
208208
209209
## Tips for good Text2SQL performance.
210210

text_2_sql/autogen/pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,6 @@ databricks = [
4141
postgresql = [
4242
"text_2_sql_core[postgresql]",
4343
]
44+
sqlite = [
45+
"text_2_sql_core[sqlite]",
46+
]

text_2_sql/data_dictionary/.env

Lines changed: 0 additions & 17 deletions
This file was deleted.

text_2_sql/text_2_sql_core/src/text_2_sql_core/connectors/open_ai.py

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -74,20 +74,20 @@ async def run_completion_request(
7474
else:
7575
return message.content
7676

77-
async def run_embedding_request(self, batch: list[str]):
78-
token_provider, api_key = self.get_authentication_properties()
77+
# async def run_embedding_request(self, batch: list[str]):
78+
# token_provider, api_key = self.get_authentication_properties()
7979

80-
model_deployment = os.environ["OpenAI__EmbeddingModel"]
81-
async with AsyncAzureOpenAI(
82-
azure_deployment=model_deployment,
83-
api_version=os.environ["OpenAI__ApiVersion"],
84-
azure_endpoint=os.environ["OpenAI__Endpoint"],
85-
azure_ad_token_provider=token_provider,
86-
api_key=api_key,
87-
) as open_ai_client:
88-
embeddings = await open_ai_client.embeddings.create(
89-
model=os.environ["OpenAI__EmbeddingModel"],
90-
input=batch,
91-
)
80+
# model_deployment = os.environ["OpenAI__EmbeddingModel"]
81+
# async with AsyncAzureOpenAI(
82+
# azure_deployment=model_deployment,
83+
# api_version=os.environ["OpenAI__ApiVersion"],
84+
# azure_endpoint=os.environ["OpenAI__Endpoint"],
85+
# azure_ad_token_provider=token_provider,
86+
# api_key=api_key,
87+
# ) as open_ai_client:
88+
# embeddings = await open_ai_client.embeddings.create(
89+
# model=os.environ["OpenAI__EmbeddingModel"],
90+
# input=batch,
91+
# )
9292

93-
return embeddings
93+
# return embeddings

0 commit comments

Comments
 (0)