ON-prem SQL databse #2493

FancyGG · 2025-04-16T13:44:11Z

How can we connect this solution to an on prem SQL server so my data is directly pulled from there

PujaAmmineni · 2025-05-02T19:36:56Z

For connecting to your on-premises SQL Server, I'd suggest using Azure Data Factory with a self-hosted Integration Runtime

Install Integration Runtime on your local network
Set up your SQL Server connection (just need the connection string and credentials)
Create a simple pipeline in Azure Data Factory to pull your data
Set up how often you want it to sync

cforce · 2025-05-03T06:04:40Z

@PujaAmmineni

Create a simple pipeline in Azure Data Factory to pull your data
Can you give more details (eg bicep code). How does it work? E.g how this pipeline produces documents which are then pushed to blob storage which uses integrated vectorizing or how a function is executed which uses the processdocs python code?"

PujaAmmineni · 2025-05-04T04:23:09Z

@cforce

The pipeline in Azure Data Factory is designed to handle data processing automatically. Here's how it works - first, we set up the pipeline structure using this code:

Pipeline Setup (Bicep Code)

resource pipeline 'Microsoft.DataFactory/factories/pipelines@2018-06-01' = {
  name: '${dataFactory.name}/DataPipeline'
  properties: {
    activities: [
      {
        name: 'CopyFromSource'
        type: 'Copy'
        inputs: [{ referenceName: 'SourceDataset', type: 'DatasetReference' }]
        outputs: [{ referenceName: 'BlobDataset', type: 'DatasetReference' }]
      }
    ]
  }
}

Document Processing and Vectorization
The pipeline follows this flow:

Extracts data from source
Copies to Azure Blob Storage
Triggers Python function for processing

Function Execution

@app.route(route="process_data")
def process_data(req: func.HttpRequest):
    # Process documents with BERT vectorization
    vectorizer = TextVectorizer()
    for record in data:
        vector = vectorizer.vectorize(process_text(record))
        record['vector'] = vector
    
    # Store processed results
    blob_client.upload_data({'processed_data': processed_data})

The pipeline runs on a scheduled recurrence, automatically handling the entire process from data extraction through vectorization to storage in blob storage. When new data arrives, the function processes it using BERT vectorization and stores both original and processed versions.

FancyGG · 2025-05-04T10:44:02Z

why cant i use the connection string to directly connect to my on prem server rather then have it on cloud and set up a pipeline to refresh ?

PujaAmmineni · 2025-05-04T16:35:39Z

You can definitely connect directly using a SQL connection string, but I recommend Azure Data Factory because this OpenAI demo needs extra processing like vectorization and indexing—which ADF securely automates without needing you to manage firewall rules, VPNs, or custom pipelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ON-prem SQL databse #2493

ON-prem SQL databse #2493

FancyGG commented Apr 16, 2025

PujaAmmineni commented May 2, 2025

Uh oh!

cforce commented May 3, 2025

Uh oh!

PujaAmmineni commented May 4, 2025

Uh oh!

FancyGG commented May 4, 2025

Uh oh!

PujaAmmineni commented May 4, 2025

Uh oh!

ON-prem SQL databse #2493

ON-prem SQL databse #2493

Comments

FancyGG commented Apr 16, 2025

PujaAmmineni commented May 2, 2025

Uh oh!

cforce commented May 3, 2025

Uh oh!

PujaAmmineni commented May 4, 2025

Uh oh!

FancyGG commented May 4, 2025

Uh oh!

PujaAmmineni commented May 4, 2025

Uh oh!