The guardian of the tender pipeline! 🛡️ This repository contains the source code for the SQS Tender Deduplication Lambda, a critical component of the tender data processing pipeline. Its mission is to efficiently filter out duplicate and expired tender messages received from various scrapers, ensuring that only unique, valid tenders are passed to downstream services for AI enrichment.
- 🎯 Overview
- 🏗️ Architecture
- ✨ Core Features
- 🚀 Getting Started
- ⚙️ Configuration
- 📦 Deployment Guide
- 🧰 Troubleshooting Guide
- 🤝 Contributing
The function springs into action when messages arrive in a central SQS queue (tender-queue). It first validates incoming messages to reject tenders that are already closed (i.e., their closingDate is in the past). ⏰
For valid, open tenders, it then inspects the message to extract its source and tender number, performs a lightning-fast lookup against an in-memory cache of existing tenders from the primary RDS database, and routes the message to the appropriate destination queue. 🚄
- ✅ Unique, Valid Tenders are sent to the
AIQueue.fifofor processing. - ❌ Duplicate & Closed Tenders are sent to the
DuplicateQueue.fifo. For closed tenders, afailureReasonis added to the message body for crystal-clear traceability.
This process prevents costly and redundant processing by the downstream AI services and ensures rock-solid data integrity. 💎
The function operates within a secure, serverless architecture inside our primary VPC like a fortress! 🏰
- 📥 Ingest: Scrapers push raw tender JSON data into the
tender-queue. - ⚡ Trigger: SQS triggers the Deduplication Lambda with a batch of messages.
- 🗄️ Cache Population: On a cold start, the Lambda securely connects to the RDS (MS SQL Server) database via the VPC to populate its in-memory deduplication cache.
- 🔍 Validation: For each message, the Lambda performs a Tender Validation Check. It parses the
closingDate, assumes unspecified date-times are SAST (South Africa Standard Time), converts them to UTC, and compares them against the current UTC date. - 🔄 Deduplication: If the tender is valid (not closed), the Lambda performs a Deduplication Check against the in-memory
HashSetusing the tender'ssourceandtenderNumber. - 🎯 Routing: Based on the checks, the Lambda routes the message:
- ✅ Unique & Valid →
AIQueue.fifo - ❌ Duplicate or Closed →
DuplicateQueue.fifo. If closed, afailureReasonis added to the JSON body.
- ✅ Unique & Valid →
- ✅ Acknowledge: Finally, the Lambda deletes the processed messages from the
tender-queue.
All communication with AWS services (SQS, RDS) is handled securely and privately via VPC Endpoints. 🔐
-
⏰ Tender Expiry Validation: A smart validation service runs before deduplication. It intelligently parses the
closingDate, assumes unspecified times are SAST (South Africa Standard Time) and converts to UTC, then rejects any tender that is already closed. This saves precious processing resources on expired items! -
🚀 High-Throughput Processing: Designed to handle thousands of messages per minute using a continuous polling strategy within the Lambda execution window.
-
⚡ Efficient Deduplication: Utilises a static in-memory
HashSetfor near-instantaneous O(1) duplicate lookups (for valid tenders), dramatically reducing database load. -
🎯 Optimised Database Access: Queries the database only once per Lambda cold start to populate the cache, minimising database connections and read operations.
-
🪶 Lightweight Message Parsing: Avoids full JSON deserialisation by only parsing the necessary
source,tenderNumber, andclosingDatefields, improving performance and reducing memory usage. -
📋 Traceable Rejection: Closed tenders and malformed JSON messages are routed to the
DuplicateQueuewith afailureReasonadded to the JSON body, providing crystal-clear traceability for monitoring. -
🔒 Secure by Design: Operates entirely within a private VPC, with no public internet access. All communication with AWS services is handled via secure VPC endpoints.
-
🛡️ Robust Error Handling: Correctly routes malformed messages and handles failed SQS operations to prevent data loss.
Ready to dive in? Follow these steps to set up the project for local development! 🎉
- .NET 8 SDK 💻
- AWS CLI configured with appropriate credentials 🔑
- Visual Studio 2022 or VS Code with C# extensions 🛠️
-
📁 Clone the repository:
git clone <your-repository-url> cd Sqs-Deduplication-Lambda
-
📦 Restore Dependencies:
dotnet restore
-
🔐 Configure User Secrets: This project uses .NET's Secret Manager to handle the database connection string securely during local development.
dotnet user-secrets init dotnet user-secrets set "DB_CONNECTION_STRING" "your-local-or-dev-db-connection-string"
💡 Note: For the function to access the RDS database from your local machine, your IP address may need to be whitelisted in the RDS instance's security group.
The Lambda function is configured via environment variables. These must be set in the Lambda function's configuration in AWS. 🔧
| Variable Name | Required | Description |
|---|---|---|
SOURCE_QUEUE_URL |
✅ Yes | The URL of the source SQS queue (tender-queue). |
AI_QUEUE_URL |
✅ Yes | The URL of the destination FIFO queue for unique tenders. |
DUPLICATE_QUEUE_URL |
✅ Yes | The URL of the destination FIFO queue for duplicate and rejected tenders. |
DB_CONNECTION_STRING |
✅ Yes | The full connection string for the RDS SQL Server database. |
This section covers three deployment methods for the Tender Deduplication Lambda Function. Choose the method that best fits your workflow and infrastructure preferences.
Before deploying, ensure you have:
- AWS CLI configured with appropriate credentials 🔑
- .NET 8 SDK installed locally
- AWS SAM CLI installed (for SAM deployment)
- Access to AWS Lambda, SQS, RDS, and VPC services ☁️
- Visual Studio 2022 or VS Code with C# extensions (for AWS Toolkit deployment)
Deploy directly through Visual Studio using the AWS Toolkit extension.
- Install AWS Toolkit for Visual Studio 2022
- Configure AWS Profile with your credentials in Visual Studio
- Open Solution containing
TenderDeduplication.csproj
- Right-click the project in Solution Explorer
- Select "Publish to AWS Lambda" from the context menu
- Configure Lambda Settings:
- Function Name:
TenderDeduplicationLambda - Runtime:
.NET 8 - Handler:
TenderDeduplication::TenderDeduplication.Function::FunctionHandler - Memory:
512 MB - Timeout:
300 seconds
- Function Name:
- Configure VPC Settings:
- VPC: Select your existing VPC
- Security Groups:
sg-0043b58a403174a59 - Subnets:
subnet-0f47b68400d516b1e,subnet-072a27234084339fc
- Set Environment Variables:
AI_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo DB_CONNECTION_STRING= DUPLICATE_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/DuplicatesQueue.fifo SOURCE_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/TenderQueue.fifo - Configure IAM Role with required permissions for SQS, RDS, VPC, and CloudWatch
- Set up SQS Trigger manually after deployment
- Test the function using the AWS Toolkit test feature
- Monitor logs through CloudWatch integration
- Verify SQS trigger configuration and batch processing
Use AWS SAM for infrastructure-as-code deployment with the provided template.
# Install AWS SAM CLI
pip install aws-sam-cli
# Install .NET 8 SDK
# Download from https://dotnet.microsoft.com/download/dotnet/8.0
# Verify installations
sam --version
dotnet --version# Build the .NET 8 application
dotnet build -c Release
# Build the SAM application
sam build
# Deploy with guided configuration (first time)
sam deploy --guided
# Follow the prompts:
# Stack Name: tender-deduplication-stack
# AWS Region: us-east-1 (or your preferred region)
# Confirm changes before deploy: Y
# Allow SAM to create IAM roles: Y
# Save parameters to samconfig.toml: YThe template already includes the required environment variables:
# Already configured in TenderDeduplicationLambda.yaml
Environment:
Variables:
AI_QUEUE_URL: https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo
DB_CONNECTION_STRING:
DUPLICATE_QUEUE_URL: https://sqs.us-east-1.amazonaws.com/211635102441/DuplicatesQueue.fifo
SOURCE_QUEUE_URL: https://sqs.us-east-1.amazonaws.com/211635102441/TenderQueue.fifo# Quick deployment after initial setup
dotnet build -c Release
sam build && sam deploy# Test function locally (requires Docker)
sam local invoke TenderDeduplicationLambda
# Start local API for testing
sam local start-api- ✅ Complete infrastructure management including SQS queues
- ✅ VPC and security group configuration included
- ✅ Environment variables defined in template
- ✅ IAM permissions automatically configured
- ✅ Easy rollback capabilities
- ✅ CloudFormation integration
- ✅ SQS trigger automatically configured
Automated deployment using GitHub Actions workflow for production environments.
-
GitHub Repository Secrets:
AWS_ACCESS_KEY_ID: Your AWS access key AWS_SECRET_ACCESS_KEY: Your AWS secret key AWS_REGION: us-east-1 (or your target region) -
Pre-existing Lambda Function: The workflow updates an existing function, so deploy initially using Method 1 or 2.
-
Create Release Branch:
# Create and switch to release branch git checkout -b release # Make your changes to the .NET code # Commit changes git add . git commit -m "feat: update tender deduplication logic" # Push to trigger deployment git push origin release
-
Automatic Deployment: The workflow will:
- Checkout the code
- Set up .NET 8 SDK
- Install AWS Lambda Tools
- Build and package the Lambda function
- Configure AWS credentials
- Update the existing Lambda function code
- Maintain existing configuration (environment variables, VPC settings, etc.)
You can also trigger deployment manually:
- Go to Actions tab in your GitHub repository
- Select "Deploy .NET Lambda to AWS" workflow
- Click "Run workflow"
- Choose the
releasebranch - Click "Run workflow" button
- ✅ Automated CI/CD pipeline
- ✅ Consistent deployment process
- ✅ Audit trail of deployments
- ✅ Easy rollback to previous commits
- ✅ No local environment dependencies
- ✅ Automatic .NET build and packaging
Regardless of deployment method, verify the following:
Ensure these environment variables are properly set:
# Verify environment variables via AWS CLI
aws lambda get-function-configuration \
--function-name TenderDeduplicationLambda \
--query 'Environment.Variables'Expected output:
{
"AI_QUEUE_URL": "https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo",
"DB_CONNECTION_STRING": "",
"DUPLICATE_QUEUE_URL": "https://sqs.us-east-1.amazonaws.com/211635102441/DuplicatesQueue.fifo",
"SOURCE_QUEUE_URL": "https://sqs.us-east-1.amazonaws.com/211635102441/TenderQueue.fifo"
}Verify VPC settings for database and SQS access:
# Check VPC configuration
aws lambda get-function-configuration \
--function-name TenderDeduplicationLambda \
--query 'VpcConfig'Ensure the SQS trigger is properly configured:
# List event source mappings
aws lambda list-event-source-mappings \
--function-name TenderDeduplicationLambda
# Verify batch size and queue configuration
aws lambda get-event-source-mapping \
--uuid [event-source-mapping-uuid]Test database connectivity from the Lambda function:
# Invoke function to test database connection
aws lambda invoke \
--function-name TenderDeduplicationLambda \
--payload '{"Records":[]}' \
response.jsonAfter deployment, test the function thoroughly:
# Send test message to source queue
aws sqs send-message \
--queue-url https://sqs.us-east-1.amazonaws.com/211635102441/TenderQueue.fifo \
--message-body '{"source":"TestSource","tenderNumber":"TEST-001","closingDate":"2025-12-31T23:59:59"}' \
--message-group-id "TestGroup" \
--message-deduplication-id "test-$(date +%s)"
# Monitor function execution
aws logs tail /aws/lambda/TenderDeduplicationLambda --follow- ✅ Function executes without errors
- ✅ CloudWatch logs show successful database connection
- ✅ Messages are properly routed to AI queue or duplicate queue
- ✅ No timeout or memory errors
- ✅ Proper deduplication logic working
- ✅ SQS batch processing functioning correctly
- Duration: Function execution time for batch processing
- Error Rate: Failed deduplication operations
- Memory Utilization: RAM usage during processing
- SQS Metrics: Message processing rates and dead letter queues
- Database Connection Health: RDS connection metrics
# View recent logs
aws logs tail /aws/lambda/TenderDeduplicationLambda --follow
# Search for deduplication statistics
aws logs filter-log-events \
--log-group-name /aws/lambda/TenderDeduplicationLambda \
--filter-pattern "Processed batch"
# Search for database connection issues
aws logs filter-log-events \
--log-group-name /aws/lambda/TenderDeduplicationLambda \
--filter-pattern "Database connection"
# Monitor SQS routing decisions
aws logs filter-log-events \
--log-group-name /aws/lambda/TenderDeduplicationLambda \
--filter-pattern "Routed to".NET 8 Runtime Issues
Issue: Function fails to start or throws runtime errors
Solution: Ensure proper .NET 8 configuration:
- Verify the handler path:
TenderDeduplication::TenderDeduplication.Function::FunctionHandler - Check that all NuGet packages are compatible with .NET 8
- Ensure the project targets
net8.0framework - Verify all dependencies are included in the deployment package
Database Connection Failures
Issue: Cannot connect to RDS SQL Server from Lambda
Solution: Verify VPC and security configuration:
- Ensure Lambda is in the same VPC as RDS
- Check security groups allow traffic on port 1433
- Verify RDS is accessible from Lambda subnets
- Test connection string format and credentials
- Check if RDS is in a maintenance window
SQS Message Processing Issues
Issue: Messages not being processed or routed incorrectly
Solution: Debug SQS configuration:
- Verify SQS trigger is configured with correct batch size (10)
- Check message format matches expected JSON structure
- Ensure FIFO queue attributes are properly set
- Verify message group ID and deduplication ID logic
- Monitor dead letter queue for failed messages
VPC Networking Problems
Issue: Function times out or cannot access AWS services
Solution: Check VPC configuration:
- Ensure VPC endpoints exist for SQS service access
- Verify route tables and NAT gateway configuration
- Check that subnets have proper CIDR ranges
- Ensure DNS resolution is enabled in VPC settings
- Verify security group rules allow outbound HTTPS traffic
Memory and Performance Issues
Issue: Function runs out of memory or times out
Solution: Optimize function performance:
- Increase memory allocation (current: 512 MB)
- Optimize batch processing logic
- Review database query performance
- Consider implementing connection pooling
- Monitor cold start times and optimize accordingly
Environment Variables Missing
Issue: Function cannot access required configuration
Solution: Set environment variables using AWS CLI:
aws lambda update-function-configuration \
--function-name TenderDeduplicationLambda \
--environment Variables='{
"AI_QUEUE_URL":"https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo",
"DB_CONNECTION_STRING":"",
"DUPLICATE_QUEUE_URL":"https://sqs.us-east-1.amazonaws.com/211635102441/DuplicatesQueue.fifo",
"SOURCE_QUEUE_URL":"https://sqs.us-east-1.amazonaws.com/211635102441/TenderQueue.fifo"
}'Workflow Deployment Fails
Issue: GitHub Actions workflow errors
Solution:
- Check repository secrets are correctly configured
- Verify .NET 8 SDK is properly installed in workflow
- Ensure AWS Lambda Tools installation succeeds
- Check that TenderDeduplication.csproj exists in repository
- Verify target Lambda function exists in AWS
Choose the deployment method that best fits your development workflow and infrastructure requirements. SAM deployment is recommended for development environments, while workflow deployment excels for production systems requiring automated CI/CD pipelines.
Don't panic! This section documents common issues encountered during deployment and how to solve them. 🔧
🚨 Error: Connection Timed Out or Function Hangs When Sending SQS Messages
This is a complex VPC networking issue with several potential causes. Follow this checklist in order:
-
⏱️ Lambda Timeout Setting: The default Lambda timeout is 30 seconds. This is often too short for a function performing network I/O.
- 🔧 Fix: In the Lambda's Configuration > General configuration, increase the timeout to at least 3 minutes.
-
🚫 Missing VPC Endpoint: A Lambda in a private VPC cannot access public AWS service endpoints.
- 🔧 Fix: Create a VPC Interface Endpoint for SQS (
com.amazonaws.region.sqs) and place it in the same private subnets as your Lambda.
- 🔧 Fix: Create a VPC Interface Endpoint for SQS (
-
🌐 VPC DNS Settings: The VPC must be configured to use the Amazon DNS server to resolve the endpoint's private name.
- 🔧 Fix: In the VPC Dashboard, select your VPC, click Actions > Edit VPC settings, and ensure both "Enable DNS resolution" and "Enable DNS hostnames" are checked.
-
🔒 Endpoint Security Group: The endpoint's own security group must allow inbound traffic from the Lambda.
- 🔧 Fix: Edit the inbound rules on the security group attached to the VPC Endpoint. Add a rule allowing HTTPS (Port 443) from the source security group ID of your Lambda function. This is the most commonly missed step!
⚠️ Error: DbContext Concurrency Exception (A second operation was started...)
🔍 Cause: The DbContext was registered as a singleton, but multiple parallel queries (Task.WhenAll) were attempting to use it simultaneously. DbContext is not thread-safe.
✅ Solution: Change the dependency injection registration in Function.cs from services.AddDbContext(...) to services.AddDbContextFactory(...). Inject the IDbContextFactory into the service and create a new DbContext instance for each database operation.
🚫 Error: Function Not Triggered by SQS After VPC Placement
🔍 Cause: The SQS trigger's permissions can become invalid after a Lambda is moved into a VPC.
✅ Solution: In the Lambda's Configuration > Triggers tab, delete the existing SQS trigger and immediately re-add it. This forces AWS to regenerate the correct resource-based invocation policy.
📨 Error: Sending to .fifo Queues Fails or Times Out
🔍 Cause: SQS FIFO queues require two mandatory attributes for every message: MessageGroupId and MessageDeduplicationId.
✅ Solution: The SqsService was updated to detect if a queue is FIFO. It now generates a unique MessageDeduplicationId (Guid.NewGuid()) and uses the tender source as the MessageGroupId.
📅 Error: Date Validation Issues with SAST Time Zone
🔍 Cause: The tender validation may incorrectly parse closing dates or fail to properly convert SAST to UTC.
✅ Solution: Ensure the validation service correctly handles date parsing:
- Unspecified date-times are assumed to be SAST (UTC+2) 🌍
- All comparisons are done in UTC ⏰
- Check that the
TimeZoneInfo.FindSystemTimeZoneById("South Africa Standard Time")is available in the Lambda runtime 🔍
We welcome contributions with open arms! Please follow these steps: 🎉
- Create a new feature branch from
main(e.g.,feature/add-new-source). 🌿 - Make your changes. ✏️
- Commit your work and push it to the remote repository. 📤
- Open a Pull Request for review. 👀
Built with love, bread, and code by Bread Corporation 🦆❤️💻