A complete event-driven data pipeline that processes JSON files uploaded to Azure Blob Storage and saves the data to Azure Cosmos DB using Azure Functions.
- Azure Blob Storage: Triggers when JSON files are uploaded to the "data" container
- Azure Functions: Processes the uploaded files and extracts data
- Azure Cosmos DB: Stores the processed data in a NoSQL database
- Terraform: Infrastructure as Code for Azure resource provisioning
Before running this project, ensure you have:
- Node.js (v18 or higher)
- Azure CLI
- Azure Functions Core Tools
- Terraform (optional, for infrastructure setup)
- An active Azure subscription
git clone https://github.com/gopalepic/Event-driven-pipeline.git
cd Event-driven-pipeline
- Navigate to the root directory and initialize Terraform:
terraform init
terraform plan
terraform apply
- Note down the output values:
- Storage Account connection string
- Cosmos DB endpoint and key
- Navigate to the function app directory:
cd process_data
- Install dependencies:
npm install
- Create
local.settings.json
file with your Azure credentials:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=YOUR_STORAGE_ACCOUNT;AccountKey=YOUR_STORAGE_KEY;EndpointSuffix=core.windows.net",
"FUNCTIONS_WORKER_RUNTIME": "node",
"COSMOSDB_ENDPOINT": "",
"COSMOSDB_KEY": "YOUR_COSMOS_PRIMARY_KEY"
}
}
- Build the TypeScript code:
npm run build
- Start the Azure Functions runtime:
func start
You should see output similar to:
Azure Functions Core Tools
Core Tools Version: 4.x.x
Function Runtime Version: 4.x.x
Functions:
blobTrigger: blobTrigger
Host lock lease acquired by instance ID 'xxxxx'.
- Create a test JSON file (
test-data.json
):
[
{"id": "1", "value": "test data"},
{"id": "2", "value": "more test data"},
{"id": "3", "value": "even more test data"}
]
-
Upload the file to your Azure Storage blob container named "data"
-
Watch the function logs - you should see:
Processing blob: test-data.json
Saved item with id: 1
Saved item with id: 2
Saved item with id: 3
Data saved to Cosmos DB - 3 items processed
Variable | Description | Example |
---|---|---|
AzureWebJobsStorage |
Storage account connection string | DefaultEndpointsProtocol=https;AccountName=... |
COSMOSDB_ENDPOINT |
Cosmos DB endpoint URL | https://myaccount.documents.azure.com:443/ |
COSMOSDB_KEY |
Cosmos DB primary key | AccountKey=xxxxx |
The function supports both:
- Single JSON object:
{"id": "1", "value": "data"}
- Array of JSON objects:
[{"id": "1", "value": "data"}, {"id": "2", "value": "more data"}]
Each object must have an id
field for Cosmos DB partitioning.
Event-driven-pipeline/
βββ main.tf # Terraform infrastructure
βββ process_data/
β βββ src/
β β βββ functions/
β β βββ ProcessBlob.ts # Main function code
β βββ dist/ # Compiled JavaScript
β βββ package.json # Dependencies
β βββ host.json # Function app configuration
β βββ local.settings.json # Local environment variables
β βββ tsconfig.json # TypeScript configuration
βββ README.md # This file
- Switch to the problems branch:
git checkout problems
-
Make your changes to
process_data/src/functions/ProcessBlob.ts
-
Build and test:
cd process_data
npm run build
func start
- Commit and push:
git add .
git commit -m "Your changes"
git push origin problems
- Create a Pull Request to merge into main
For local development, you can use Azure Storage Emulator:
# Install Azurite (Azure Storage Emulator)
npm install -g azurite
# Start Azurite
azurite --silent --location c:\azurite --debug c:\azurite\debug.log
Update local.settings.json
:
{
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
...
}
}
-
Function not starting:
- Ensure you're in the
process_data
directory - Check that
host.json
andlocal.settings.json
exist
- Ensure you're in the
-
Cosmos DB connection errors:
- Verify your
COSMOSDB_ENDPOINT
andCOSMOSDB_KEY
values - Ensure the database "pipeline-db" and container "data" exist
- Verify your
-
Blob trigger not firing:
- Check your storage account connection string
- Ensure the "data" container exists
- Verify blob files are being uploaded correctly
-
TypeScript compilation errors:
- Run
npm install
to ensure all dependencies are installed - Check
tsconfig.json
configuration
- Run
Enable verbose logging by starting with:
func start --verbose
Monitor your pipeline through:
- Function App: View execution logs and metrics
- Storage Account: Monitor blob uploads and triggers
- Cosmos DB: Check data insertion and query performance
The function logs will show:
- Blob processing events
- Individual item saves to Cosmos DB
- Error messages and stack traces
For production:
- Use Azure Key Vault for secrets
- Enable managed identity
- Configure network security rules
- Set up monitoring and alerts
Never commit local.settings.json
to version control. It's already included in .gitignore
.
This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
If you encounter any issues:
- Check the troubleshooting section above
- Review Azure Functions documentation
- Open an issue in this repository
Happy coding! π