Skip to content

ON-prem SQL databse #2493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FancyGG opened this issue Apr 16, 2025 · 5 comments
Open

ON-prem SQL databse #2493

FancyGG opened this issue Apr 16, 2025 · 5 comments

Comments

@FancyGG
Copy link

FancyGG commented Apr 16, 2025

How can we connect this solution to an on prem SQL server so my data is directly pulled from there

@PujaAmmineni
Copy link

For connecting to your on-premises SQL Server, I'd suggest using Azure Data Factory with a self-hosted Integration Runtime

  1. Install Integration Runtime on your local network
  2. Set up your SQL Server connection (just need the connection string and credentials)
  3. Create a simple pipeline in Azure Data Factory to pull your data
  4. Set up how often you want it to sync

@cforce
Copy link
Contributor

cforce commented May 3, 2025

@PujaAmmineni

Create a simple pipeline in Azure Data Factory to pull your data
Can you give more details (eg bicep code). How does it work? E.g how this pipeline produces documents which are then pushed to blob storage which uses integrated vectorizing or how a function is executed which uses the processdocs python code?"

@PujaAmmineni
Copy link

@cforce

The pipeline in Azure Data Factory is designed to handle data processing automatically. Here's how it works - first, we set up the pipeline structure using this code:

  1. Pipeline Setup (Bicep Code)
resource pipeline 'Microsoft.DataFactory/factories/pipelines@2018-06-01' = {
  name: '${dataFactory.name}/DataPipeline'
  properties: {
    activities: [
      {
        name: 'CopyFromSource'
        type: 'Copy'
        inputs: [{ referenceName: 'SourceDataset', type: 'DatasetReference' }]
        outputs: [{ referenceName: 'BlobDataset', type: 'DatasetReference' }]
      }
    ]
  }
}
  1. Document Processing and Vectorization
    The pipeline follows this flow:
  • Extracts data from source
  • Copies to Azure Blob Storage
  • Triggers Python function for processing
  1. Function Execution
@app.route(route="process_data")
def process_data(req: func.HttpRequest):
    # Process documents with BERT vectorization
    vectorizer = TextVectorizer()
    for record in data:
        vector = vectorizer.vectorize(process_text(record))
        record['vector'] = vector
    
    # Store processed results
    blob_client.upload_data({'processed_data': processed_data})

The pipeline runs on a scheduled recurrence, automatically handling the entire process from data extraction through vectorization to storage in blob storage. When new data arrives, the function processes it using BERT vectorization and stores both original and processed versions.

@FancyGG
Copy link
Author

FancyGG commented May 4, 2025

why cant i use the connection string to directly connect to my on prem server rather then have it on cloud and set up a pipeline to refresh ?

@PujaAmmineni
Copy link

You can definitely connect directly using a SQL connection string, but I recommend Azure Data Factory because this OpenAI demo needs extra processing like vectorization and indexing—which ADF securely automates without needing you to manage firewall rules, VPNs, or custom pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants