Skip to content

Commit 500c130

Browse files
Update prompts and instruct that questions may be relevant in context (#143)
1 parent 03e8d4b commit 500c130

16 files changed

+299
-278
lines changed

.pre-commit-config.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,9 @@ repos:
4545
args: [--fix, --ignore, UP007]
4646
exclude: samples
4747

48-
- repo: https://github.com/astral-sh/uv-pre-commit
49-
# uv version.
50-
rev: 0.5.5
51-
hooks:
52-
# Update the uv lockfile
53-
- id: uv-lock
48+
# - repo: https://github.com/astral-sh/uv-pre-commit
49+
# # uv version.
50+
# rev: 0.5.5
51+
# hooks:
52+
# # Update the uv lockfile
53+
# - id: uv-lock

text_2_sql/GETTING_STARTED.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ To get started, perform the following steps:
55
1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, alongside a SQL Server sample database, AI Search and a storage account.
66
2. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`.
77
3. Run `uv sync` within the text_2_sql directory to install dependencies.
8+
- Install the optional dependencies if you need a database connector other than TSQL. `uv sync --extra <DATABASE ENGINE>`
9+
- See the supported connectors in `text_2_sql_core/src/text_2_sql_core/connectors`.
810
4. Create your `.env` file based on the provided sample `.env.example`. Place this file in the same place as the `.env.example`.
911
5. Generate a data dictionary for your target server using the instructions in the **Running** section of the `data_dictionary/README.md`.
10-
6. Upload these data dictionaries to the relevant containers in your storage account. Wait for them to be automatically indexed with the included skillsets.
12+
6. Upload these generated data dictionaries files to the relevant containers in your storage account. Wait for them to be automatically indexed with the included skillsets.
1113
7. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started.

text_2_sql/autogen/Iteration 5 - Agentic Vector Based Text2SQL.ipynb

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,13 @@
3535
"\n",
3636
"### Dependencies\n",
3737
"\n",
38-
"To install dependencies for this demo:\n",
38+
"To install dependencies for this demo. Navigate to the autogen directory:\n",
3939
"\n",
40-
"`uv sync --package autogen_text_2_sql`\n",
40+
"`uv sync`\n",
4141
"\n",
42-
"`uv add --editable text_2_sql_core`"
42+
"If you need a differnet connector to TSQL.\n",
43+
"\n",
44+
"`uv sync --extra <DATABASE ENGINE>`"
4345
]
4446
},
4547
{
@@ -87,6 +89,13 @@
8789
"agentic_text_2_sql = AutoGenText2Sql(use_case=\"Analysing sales data\")"
8890
]
8991
},
92+
{
93+
"cell_type": "code",
94+
"execution_count": null,
95+
"metadata": {},
96+
"outputs": [],
97+
"source": []
98+
},
9099
{
91100
"cell_type": "markdown",
92101
"metadata": {},
@@ -100,7 +109,7 @@
100109
"metadata": {},
101110
"outputs": [],
102111
"source": [
103-
"async for message in agentic_text_2_sql.process_user_message(UserMessagePayload(user_message=\"What is the total number of sales?\")):\n",
112+
"async for message in agentic_text_2_sql.process_user_message(UserMessagePayload(user_message=\"what are the total sales\")):\n",
104113
" logging.info(\"Received %s Message from Text2SQL System\", message)"
105114
]
106115
},
@@ -128,7 +137,7 @@
128137
"name": "python",
129138
"nbconvert_exporter": "python",
130139
"pygments_lexer": "ipython3",
131-
"version": "3.12.7"
140+
"version": "3.12.8"
132141
}
133142
},
134143
"nbformat": 4,

text_2_sql/autogen/pyproject.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@ readme = "README.md"
66
requires-python = ">=3.12"
77
dependencies = [
88
"aiostream>=0.6.4",
9-
"autogen-agentchat==0.4.0.dev11",
10-
"autogen-core==0.4.0.dev11",
11-
"autogen-ext[azure,openai]==0.4.0.dev11",
9+
"autogen-agentchat==0.4.2",
10+
"autogen-core==0.4.2",
11+
"autogen-ext[azure,openai]==0.4.2",
1212
"grpcio>=1.68.1",
1313
"pyyaml>=6.0.2",
14-
"text_2_sql_core[snowflake,databricks]",
14+
"text_2_sql_core",
1515
"sqlparse>=0.4.4",
1616
"nltk>=3.8.1",
1717
]

text_2_sql/autogen/src/autogen_text_2_sql/creators/llm_agent_creator.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Copyright (c) Microsoft Corporation.
22
# Licensed under the MIT License.
3-
from autogen_core.components.tools import FunctionToolAlias
3+
from autogen_core.tools import FunctionTool
44
from autogen_agentchat.agents import AssistantAgent
55
from text_2_sql_core.connectors.factory import ConnectorFactory
66
from text_2_sql_core.prompts.load import load
@@ -33,20 +33,20 @@ def get_tool(cls, sql_helper, tool_name: str):
3333
tool_name (str): The name of the tool to retrieve.
3434
3535
Returns:
36-
FunctionToolAlias: The tool."""
36+
FunctionTool: The tool."""
3737

3838
if tool_name == "sql_query_execution_tool":
39-
return FunctionToolAlias(
39+
return FunctionTool(
4040
sql_helper.query_execution_with_limit,
4141
description="Runs an SQL query against the SQL Database to extract information",
4242
)
4343
elif tool_name == "sql_get_entity_schemas_tool":
44-
return FunctionToolAlias(
44+
return FunctionTool(
4545
sql_helper.get_entity_schemas,
4646
description="Gets the schema of a view or table in the SQL Database by selecting the most relevant entity based on the search term. Extract key terms from the user input and use these as the search term. Several entities may be returned. Only use when the provided schemas in the message history are not sufficient to answer the question.",
4747
)
4848
elif tool_name == "sql_get_column_values_tool":
49-
return FunctionToolAlias(
49+
return FunctionTool(
5050
sql_helper.get_column_values,
5151
description="Gets the values of a column in the SQL Database by selecting the most relevant entity based on the search term. Several entities may be returned. Use this to get the correct value to apply against a filter for a user's question.",
5252
)

text_2_sql/autogen/src/autogen_text_2_sql/custom_agents/parallel_query_solving_agent.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
from autogen_agentchat.agents import BaseChatAgent
66
from autogen_agentchat.base import Response
77
from autogen_agentchat.messages import (
8-
AgentMessage,
8+
AgentEvent,
99
ChatMessage,
1010
TextMessage,
11-
ToolCallResultMessage,
11+
ToolCallExecutionEvent,
1212
)
1313
from autogen_core import CancellationToken
1414
import json
@@ -86,7 +86,7 @@ def parse_inner_message(self, message):
8686

8787
async def on_messages_stream(
8888
self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken
89-
) -> AsyncGenerator[AgentMessage | Response, None]:
89+
) -> AsyncGenerator[AgentEvent | Response, None]:
9090
last_response = messages[-1].content
9191
parameter_input = messages[0].content
9292
try:
@@ -118,7 +118,7 @@ async def consume_inner_messages_from_agentic_flow(
118118
logging.info(f"Checking Inner Message: {inner_message}")
119119

120120
try:
121-
if isinstance(inner_message, ToolCallResultMessage):
121+
if isinstance(inner_message, ToolCallExecutionEvent):
122122
for call_result in inner_message.content:
123123
# Check for SQL query results
124124
parsed_message = self.parse_inner_message(

text_2_sql/autogen/src/autogen_text_2_sql/custom_agents/sql_query_cache_agent.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
from autogen_agentchat.agents import BaseChatAgent
66
from autogen_agentchat.base import Response
7-
from autogen_agentchat.messages import AgentMessage, ChatMessage, TextMessage
7+
from autogen_agentchat.messages import AgentEvent, ChatMessage, TextMessage
88
from autogen_core import CancellationToken
99
from text_2_sql_core.custom_agents.sql_query_cache_agent import (
1010
SqlQueryCacheAgentCustomAgent,
@@ -39,7 +39,7 @@ async def on_messages(
3939

4040
async def on_messages_stream(
4141
self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken
42-
) -> AsyncGenerator[AgentMessage | Response, None]:
42+
) -> AsyncGenerator[AgentEvent | Response, None]:
4343
# Get the decomposed messages from the user_message_rewrite_agent
4444
try:
4545
request_details = json.loads(messages[0].content)

text_2_sql/autogen/src/autogen_text_2_sql/custom_agents/sql_schema_selection_agent.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
from autogen_agentchat.agents import BaseChatAgent
66
from autogen_agentchat.base import Response
7-
from autogen_agentchat.messages import AgentMessage, ChatMessage, TextMessage
7+
from autogen_agentchat.messages import AgentEvent, ChatMessage, TextMessage
88
from autogen_core import CancellationToken
99
import json
1010
import logging
@@ -39,7 +39,7 @@ async def on_messages(
3939

4040
async def on_messages_stream(
4141
self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken
42-
) -> AsyncGenerator[AgentMessage | Response, None]:
42+
) -> AsyncGenerator[AgentEvent | Response, None]:
4343
# Try to parse as JSON first
4444
try:
4545
request_details = json.loads(messages[0].content)

text_2_sql/data_dictionary/README.md

Lines changed: 14 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -207,10 +207,6 @@ This avoids having to index the fact tables, saving storage, and allows us to st
207207

208208
## Automatic Generation
209209

210-
> [!IMPORTANT]
211-
>
212-
> - The data dictionary generation scripts have been moved to `text_2_sql_core`. Documentation will be updated shortly.
213-
214210
Manually creating the `entities.json` is a time consuming exercise. To speed up generation, a mixture of SQL Queries and an LLM can be used to generate a initial version. Existing comments and descriptions in the database, can be combined with sample values to generate the necessary descriptions. Manual input can then be used to tweak it for the use case and any improvements.
215211

216212
`./text_2_sql_core/data_dictionary/data_dictionary_creator.py` contains a utility class that handles the automatic generation and selection of schemas from the source SQL database. It must be subclassed to the appropriate engine to handle engine specific queries and connection details.
@@ -222,28 +218,25 @@ The following Databases have pre-built scripts for them:
222218
- **Databricks:** `./text_2_sql_core/data_dictionary/databricks_data_dictionary_creator.py`
223219
- **Snowflake:** `./text_2_sql_core/data_dictionary/snowflake_data_dictionary_creator.py`
224220
- **TSQL:** `./text_2_sql_core/data_dictionary/tsql_data_dictionary_creator.py`
221+
- **PostgreSQL:** `./text_2_sql_core/data_dictionary/postgresql_data_dictionary_creator.py`
225222

226223
If there is no pre-built script for your database engine, take one of the above as a starting point and adjust it.
227224

228225
## Running
229226

230-
Fill out the `.env` template with connection details to your chosen database.
231-
232-
Package and install the `text_2_sql_core` library. See [build](https://docs.astral.sh/uv/concepts/projects/build/) if you want to build as a wheel and install on an agent. Or you can run from within a `uv` environment.
233-
234-
`data_dictionary <DATABASE ENGINE>`
235-
236-
You can pass the following command line arguements:
237-
238-
- `-- output_directory` or `-o`: Optional directory that the script will write the output files to.
239-
- `-- single_file` or `-s`: Optional flag that writes all schemas to a single file.
240-
- `-- generate_definitions` or `-gen`: Optional flag that uses OpenAI to generate descriptions.
241-
242-
If you need control over the following, run the file directly:
243-
244-
- `entities`: A list of entities to extract. Defaults to None.
245-
- `excluded_entities`: A list of entities to exclude.
246-
- `excluded_schemas`: A list of schemas to exclude.
227+
1. Create your `.env` file based on the provided sample `.env.example`. Place this file in the same place as the `.env.example`.
228+
2. Package and install the `text_2_sql_core` library. See [build](https://docs.astral.sh/uv/concepts/projects/build/) if you want to build as a wheel and install on an agent. Or you can run from within a `uv` environment and skip packaging.
229+
- Install the optional dependencies if you need a database connector other than TSQL. `uv sync --extra <DATABASE ENGINE>`
230+
3. Run `data_dictionary <DATABASE ENGINE>`
231+
- You can pass the following command line arguements:
232+
- `-- output_directory` or `-o`: Optional directory that the script will write the output files to.
233+
- `-- single_file` or `-s`: Optional flag that writes all schemas to a single file.
234+
- `-- generate_definitions` or `-gen`: Optional flag that uses OpenAI to generate descriptions.
235+
- If you need control over the following, run the file directly:
236+
- `entities`: A list of entities to extract. Defaults to None.
237+
- `excluded_entities`: A list of entities to exclude.
238+
- `excluded_schemas`: A list of schemas to exclude.
239+
4. Upload these generated data dictionaries files to the relevant containers in your storage account. Wait for them to be automatically indexed with the included skillsets.
247240

248241
> [!IMPORTANT]
249242
>

text_2_sql/text_2_sql_core/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ authors = [
88
]
99
requires-python = ">=3.12"
1010
dependencies = [
11+
"aiohttp>=3.11.11",
1112
"aioodbc>=0.5.0",
1213
"azure-identity>=1.19.0",
1314
"azure-search>=1.0.0b2",

0 commit comments

Comments
 (0)