Skip to content

Commit 3b87dac

Browse files
committed
Merge branch 'main' into feature/snowflake-data-dictionary-creator
2 parents aaa7592 + 2440e73 commit 3b87dac

29 files changed

+1969
-1140
lines changed

deploy_ai_search/.env

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ AIService__AzureSearchOptions__Key=<searchServiceKey if not using identity>
1111
AIService__AzureSearchOptions__UsePrivateEndpoint=<true/false>
1212
AIService__AzureSearchOptions__Identity__FQName=<fully qualified name of the identity if using user assigned identity>
1313
StorageAccount__FQEndpoint=<Fully qualified endpoint in form ResourceId=resourceId if using identity based connections>
14-
StorageAccount__ConnectionString=<connectionString if using non managed identity>
14+
StorageAccount__ConnectionString=<connectionString if using non managed identity. In format: DefaultEndpointsProtocol=https;AccountName=<STG NAME>;AccountKey=<ACCOUNT KEY>;EndpointSuffix=core.windows.net>
1515
StorageAccount__RagDocuments__Container=<containerName>
16-
StorageAccount__Text2Sql__Container=<containerName>
16+
StorageAccount__Text2SqlSchemaStore__Container=<containerName>
1717
OpenAI__ApiKey=<openAIKey if using non managed identity>
1818
OpenAI__Endpoint=<openAIEndpoint>
1919
OpenAI__EmbeddingModel=<openAIEmbeddingModelName>

deploy_ai_search/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The associated scripts in this portion of the repository contains pre-built scripts to deploy the skillset with Azure Document Intelligence.
44

5-
## Steps for Rag Documents Index Deployment
5+
## Steps for Rag Documents Index Deployment (For Unstructured RAG)
66

77
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication.
88
2. Adjust `rag_documents.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source.
@@ -13,23 +13,23 @@ The associated scripts in this portion of the repository contains pre-built scri
1313
- `rebuild`. Whether to delete and rebuild the index.
1414
- `suffix`. Optional parameter that will apply a suffix onto the deployed index and indexer. This is useful if you want deploy a test version, before overwriting the main version.
1515

16-
## Steps for Text2SQL Index Deployment
16+
## Steps for Text2SQL Index Deployment (For Structured RAG)
1717

18-
### Entity Schema Index
18+
### Schema Store Index
1919

2020
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication.
21-
2. Adjust `text_2_sql.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source.
21+
2. Adjust `text_2_sql_schema_store.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source.
2222
3. Run `deploy.py` with the following args:
2323

24-
- `index_type text_2_sql`. This selects the `Text2SQLAISearch` sub class.
24+
- `index_type text_2_sql_schema_store`. This selects the `Text2SQLSchemaStoreAISearch` sub class.
2525
- `rebuild`. Whether to delete and rebuild the index.
2626
- `suffix`. Optional parameter that will apply a suffix onto the deployed index and indexer. This is useful if you want deploy a test version, before overwriting the main version.
2727
- `single_data_dictionary`. Optional parameter that controls whether you will be uploading a single data dictionary, or a data dictionary file per entity. By default, this is set to False.
2828

2929
### Query Cache Index
3030

3131
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication.
32-
2. Adjust `text_2_sql_query_cache.py` with any changes to the index. **There is no provided indexer or skillset for this cache, it is expected that application code will write directly to it.**
32+
2. Adjust `text_2_sql_query_cache.py` with any changes to the index. **There is no provided indexer or skillset for this cache, it is expected that application code will write directly to it. See the details in the Text2SQL README for different cache strategies.**
3333
3. Run `deploy.py` with the following args:
3434

3535
- `index_type text_2_sql_query_cache`. This selects the `Text2SQLQueryCacheAISearch` sub class.

deploy_ai_search/ai_search.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,8 @@ def __init__(
4848
"""
4949

5050
if not hasattr(self, "indexer_type"):
51-
self.indexer_type = None # Needed to help mypy understand that indexer_type is defined in the child class
51+
# Needed to help mypy understand that indexer_type is defined in the child class
52+
self.indexer_type = None
5253
raise ValueError("indexer_type is not defined in the child class.")
5354

5455
if rebuild is not None:
@@ -126,13 +127,14 @@ def get_index_fields(self) -> list[SearchableField]:
126127
Returns:
127128
list[SearchableField]: The index fields"""
128129

129-
@abstractmethod
130130
def get_semantic_search(self) -> SemanticSearch:
131131
"""Get the semantic search configuration for the indexer.
132132
133133
Returns:
134134
SemanticSearch: The semantic search configuration"""
135135

136+
return None
137+
136138
def get_skills(self) -> list:
137139
"""Get the skillset for the indexer.
138140

deploy_ai_search/deploy.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Licensed under the MIT License.
33
import argparse
44
from rag_documents import RagDocumentsAISearch
5-
from text_2_sql import Text2SqlAISearch
5+
from text_2_sql_schema_store import Text2SqlSchemaStoreAISearch
66
from text_2_sql_query_cache import Text2SqlQueryCacheAISearch
77
import logging
88

@@ -20,8 +20,8 @@ def deploy_config(arguments: argparse.Namespace):
2020
rebuild=arguments.rebuild,
2121
enable_page_by_chunking=arguments.enable_page_chunking,
2222
)
23-
elif arguments.index_type == "text_2_sql":
24-
index_config = Text2SqlAISearch(
23+
elif arguments.index_type == "text_2_sql_schema_store":
24+
index_config = Text2SqlSchemaStoreAISearch(
2525
suffix=arguments.suffix,
2626
rebuild=arguments.rebuild,
2727
single_data_dictionary=arguments.single_data_dictionary,

deploy_ai_search/environment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ class IndexerType(Enum):
1212
"""The type of the indexer"""
1313

1414
RAG_DOCUMENTS = "rag-documents"
15-
TEXT_2_SQL = "text-2-sql"
15+
TEXT_2_SQL_SCHEMA_STORE = "text-2-sql-schema-store"
1616
TEXT_2_SQL_QUERY_CACHE = "text-2-sql-query-cache"
1717

1818

deploy_ai_search/rag_documents.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ def get_indexer(self) -> SearchIndexer:
281281
indexer_parameters = IndexingParameters(
282282
batch_size=batch_size,
283283
configuration=IndexingParametersConfiguration(
284-
data_to_extract=BlobIndexerDataToExtract.ALL_METADATA,
284+
data_to_extract=BlobIndexerDataToExtract.STORAGE_METADATA,
285285
query_timeout=None,
286286
execution_environment=execution_environment,
287287
fail_on_unprocessable_document=False,

deploy_ai_search/text_2_sql_query_cache.py

Lines changed: 32 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@
55
SearchFieldDataType,
66
SearchField,
77
SearchableField,
8-
SemanticField,
9-
SemanticPrioritizedFields,
10-
SemanticConfiguration,
11-
SemanticSearch,
128
SimpleField,
139
ComplexField,
1410
)
@@ -52,42 +48,52 @@ def get_index_fields(self) -> list[SearchableField]:
5248
vector_search_dimensions=self.environment.open_ai_embedding_dimensions,
5349
vector_search_profile_name=self.vector_search_profile_name,
5450
),
55-
SearchableField(
56-
name="Query", type=SearchFieldDataType.String, filterable=True
57-
),
5851
ComplexField(
59-
name="Schemas",
52+
name="SqlQueryDecomposition",
6053
collection=True,
6154
fields=[
6255
SearchableField(
63-
name="Entity",
56+
name="SqlQuery",
6457
type=SearchFieldDataType.String,
6558
filterable=True,
6659
),
6760
ComplexField(
68-
name="Columns",
61+
name="Schemas",
6962
collection=True,
7063
fields=[
7164
SearchableField(
72-
name="Name", type=SearchFieldDataType.String
73-
),
74-
SearchableField(
75-
name="Definition", type=SearchFieldDataType.String
76-
),
77-
SearchableField(
78-
name="Type", type=SearchFieldDataType.String
79-
),
80-
SearchableField(
81-
name="AllowedValues",
65+
name="Entity",
8266
type=SearchFieldDataType.String,
83-
collection=True,
84-
searchable=False,
67+
filterable=True,
8568
),
86-
SearchableField(
87-
name="SampleValues",
88-
type=SearchFieldDataType.String,
69+
ComplexField(
70+
name="Columns",
8971
collection=True,
90-
searchable=False,
72+
fields=[
73+
SearchableField(
74+
name="Name",
75+
type=SearchFieldDataType.String,
76+
),
77+
SearchableField(
78+
name="Definition",
79+
type=SearchFieldDataType.String,
80+
),
81+
SearchableField(
82+
name="DataType", type=SearchFieldDataType.String
83+
),
84+
SearchableField(
85+
name="AllowedValues",
86+
type=SearchFieldDataType.String,
87+
collection=True,
88+
searchable=False,
89+
),
90+
SearchableField(
91+
name="SampleValues",
92+
type=SearchFieldDataType.String,
93+
collection=True,
94+
searchable=False,
95+
),
96+
],
9197
),
9298
],
9399
),
@@ -101,23 +107,3 @@ def get_index_fields(self) -> list[SearchableField]:
101107
]
102108

103109
return fields
104-
105-
def get_semantic_search(self) -> SemanticSearch:
106-
"""This function returns the semantic search configuration for sql index
107-
108-
Returns:
109-
SemanticSearch: The semantic search configuration"""
110-
111-
semantic_config = SemanticConfiguration(
112-
name=self.semantic_config_name,
113-
prioritized_fields=SemanticPrioritizedFields(
114-
title_field=SemanticField(field_name="Question"),
115-
keywords_fields=[
116-
SemanticField(field_name="Query"),
117-
],
118-
),
119-
)
120-
121-
semantic_search = SemanticSearch(configurations=[semantic_config])
122-
123-
return semantic_search

deploy_ai_search/text_2_sql.py renamed to deploy_ai_search/text_2_sql_schema_store.py

Lines changed: 76 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
)
2727

2828

29-
class Text2SqlAISearch(AISearch):
29+
class Text2SqlSchemaStoreAISearch(AISearch):
3030
"""This class is used to deploy the sql index."""
3131

3232
def __init__(
@@ -41,7 +41,7 @@ def __init__(
4141
suffix (str, optional): The suffix for the indexer. Defaults to None. If an suffix is provided, it is assumed to be a test indexer.
4242
rebuild (bool, optional): Whether to rebuild the index. Defaults to False.
4343
"""
44-
self.indexer_type = IndexerType.TEXT_2_SQL
44+
self.indexer_type = IndexerType.TEXT_2_SQL_SCHEMA_STORE
4545
super().__init__(suffix, rebuild)
4646

4747
if single_data_dictionary:
@@ -62,34 +62,43 @@ def get_index_fields(self) -> list[SearchableField]:
6262
key=True,
6363
analyzer_name="keyword",
6464
),
65+
SearchableField(
66+
name="EntityName", type=SearchFieldDataType.String, filterable=True
67+
),
6568
SearchableField(
6669
name="Entity",
6770
type=SearchFieldDataType.String,
6871
analyzer_name="keyword",
6972
),
7073
SearchableField(
71-
name="EntityName", type=SearchFieldDataType.String, filterable=True
74+
name="Database",
75+
type=SearchFieldDataType.String,
7276
),
7377
SearchableField(
74-
name="Description",
78+
name="Warehouse",
79+
type=SearchFieldDataType.String,
80+
),
81+
SearchableField(
82+
name="Definition",
7583
type=SearchFieldDataType.String,
7684
sortable=False,
7785
filterable=False,
7886
facetable=False,
7987
),
8088
SearchField(
81-
name="DescriptionEmbedding",
89+
name="DefinitionEmbedding",
8290
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
8391
vector_search_dimensions=self.environment.open_ai_embedding_dimensions,
8492
vector_search_profile_name=self.vector_search_profile_name,
93+
hidden=True,
8594
),
8695
ComplexField(
8796
name="Columns",
8897
collection=True,
8998
fields=[
9099
SearchableField(name="Name", type=SearchFieldDataType.String),
91100
SearchableField(name="Definition", type=SearchFieldDataType.String),
92-
SearchableField(name="Type", type=SearchFieldDataType.String),
101+
SearchableField(name="DataType", type=SearchFieldDataType.String),
93102
SearchableField(
94103
name="AllowedValues",
95104
type=SearchFieldDataType.String,
@@ -111,6 +120,40 @@ def get_index_fields(self) -> list[SearchableField]:
111120
hidden=True,
112121
# This is needed to enable semantic searching against the column names as complex field types are not used.
113122
),
123+
SearchableField(
124+
name="ColumnDefinitions",
125+
type=SearchFieldDataType.String,
126+
collection=True,
127+
hidden=True,
128+
# This is needed to enable semantic searching against the column names as complex field types are not used.
129+
),
130+
ComplexField(
131+
name="EntityRelationships",
132+
collection=True,
133+
fields=[
134+
SearchableField(
135+
name="ForeignEntity",
136+
type=SearchFieldDataType.String,
137+
),
138+
ComplexField(
139+
name="ForeignKeys",
140+
collection=True,
141+
fields=[
142+
SearchableField(
143+
name="Column", type=SearchFieldDataType.String
144+
),
145+
SearchableField(
146+
name="ForeignColumn", type=SearchFieldDataType.String
147+
),
148+
],
149+
),
150+
],
151+
),
152+
SearchableField(
153+
name="CompleteEntityRelationshipsGraph",
154+
type=SearchFieldDataType.String,
155+
collection=True,
156+
),
114157
SimpleField(
115158
name="DateLastModified",
116159
type=SearchFieldDataType.DateTimeOffset,
@@ -131,7 +174,8 @@ def get_semantic_search(self) -> SemanticSearch:
131174
prioritized_fields=SemanticPrioritizedFields(
132175
title_field=SemanticField(field_name="EntityName"),
133176
content_fields=[
134-
SemanticField(field_name="Description"),
177+
SemanticField(field_name="Definition"),
178+
SemanticField(field_name="ColumnDefinitions"),
135179
],
136180
keywords_fields=[
137181
SemanticField(field_name="ColumnNames"),
@@ -151,7 +195,7 @@ def get_skills(self) -> list:
151195
list: The skillsets used in the indexer"""
152196

153197
embedding_skill = self.get_vector_skill(
154-
"/document", "/document/Description", target_name="DescriptionEmbedding"
198+
"/document", "/document/Definition", target_name="DefinitionEmbedding"
155199
)
156200

157201
skills = [embedding_skill]
@@ -222,12 +266,20 @@ def get_indexer(self) -> SearchIndexer:
222266
target_field_name="EntityName",
223267
),
224268
FieldMapping(
225-
source_field_name="/document/Description",
226-
target_field_name="Description",
269+
source_field_name="/document/Database",
270+
target_field_name="Database",
227271
),
228272
FieldMapping(
229-
source_field_name="/document/DescriptionEmbedding",
230-
target_field_name="DescriptionEmbedding",
273+
source_field_name="/document/Warehouse",
274+
target_field_name="Warehouse",
275+
),
276+
FieldMapping(
277+
source_field_name="/document/Definition",
278+
target_field_name="Definition",
279+
),
280+
FieldMapping(
281+
source_field_name="/document/DefinitionEmbedding",
282+
target_field_name="DefinitionEmbedding",
231283
),
232284
FieldMapping(
233285
source_field_name="/document/Columns",
@@ -237,6 +289,18 @@ def get_indexer(self) -> SearchIndexer:
237289
source_field_name="/document/Columns/*/Name",
238290
target_field_name="ColumnNames",
239291
),
292+
FieldMapping(
293+
source_field_name="/document/Columns/*/Definition",
294+
target_field_name="ColumnDefinitions",
295+
),
296+
FieldMapping(
297+
source_field_name="/document/EntityRelationships",
298+
target_field_name="EntityRelationships",
299+
),
300+
FieldMapping(
301+
source_field_name="/document/CompleteEntityRelationshipsGraph/*",
302+
target_field_name="CompleteEntityRelationshipsGraph",
303+
),
240304
FieldMapping(
241305
source_field_name="/document/DateLastModified",
242306
target_field_name="DateLastModified",

0 commit comments

Comments
 (0)