-
Notifications
You must be signed in to change notification settings - Fork 65
Add the trigger tables creation on scale provider initialization #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@MaddyDev plz resolve merge conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds asynchronous initialization to the SQL scaler provider to ensure trigger metadata tables are created reliably before scaling operations begin. Key changes include:
- Adding an async InitializeAsync and CreateAsync method to SqlScalerProvider.
- Updating the service registration in SqlBindingExtension to await async initialization.
- Refactoring methods in SqlTriggerUtils and SqlTriggerListener to improve schema, global state table, and leases table creation with proper telemetry.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
test/Unit/TriggerBinding/SqlTriggerListenerTests.cs | Added unit tests to validate listener behavior on error conditions. |
test/Integration/SqlTriggerBindingIntegrationTests.cs | Removed manual start/stop listener initialization in favor of host scale. |
src/TriggerBinding/SqlTriggerUtils.cs | Introduced static async methods for table and schema creation with telemetry. |
src/TriggerBinding/SqlTriggerListener.cs | Refactored to use new static methods for resource initialization. |
src/TriggerBinding/SqlScalerProvider.cs | Added async initialization support and factory creation method. |
src/Telemetry/Telemetry.cs | Updated telemetry event names to include InitializeScaleProvider. |
src/SqlBindingExtension.cs | Updated lazy async registration for scaler provider initialization. |
Comments suppressed due to low confidence (1)
src/TriggerBinding/SqlScalerProvider.cs:107
- The null value passed for the oldUserFunctionId parameter may lead to unexpected behavior during migration; please verify that this is intended and consider passing the proper deprecated value if available.
insertGlobalStateTableRowDurationMs = await InsertGlobalStateTableRowAsync(connection, transaction, userTableId, this._userTable, null, this._userFunctionId, this._logger, cancellationToken);
|
||
DROP TABLE {oldLeasesTableName}; | ||
END | ||
End |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The T-SQL block uses 'End' for closing a BEGIN block; please change it to uppercase 'END' to follow standard T-SQL syntax, which may prevent unexpected SQL errors.
End | |
END |
Copilot uses AI. Check for mistakes.
var scalerProviderTask = new Lazy<Task<SqlScalerProvider>>(() => | ||
SqlScalerProvider.CreateAsync(serviceProvider, triggerMetadata)); | ||
|
||
builder.Services.AddSingleton((Func<IServiceProvider, IScaleMonitorProvider>)(resolvedServiceProvider => | ||
{ | ||
serviceProvider = serviceProvider ?? resolvedServiceProvider; | ||
return scalerProvider.Value; | ||
}); | ||
builder.Services.AddSingleton((Func<IServiceProvider, ITargetScalerProvider>)delegate (IServiceProvider resolvedServiceProvider) | ||
// Wait for the async initialization to complete | ||
return scalerProviderTask.Value.GetAwaiter().GetResult(); | ||
})); | ||
|
||
builder.Services.AddSingleton((Func<IServiceProvider, ITargetScalerProvider>)(resolvedServiceProvider => | ||
{ | ||
serviceProvider = serviceProvider ?? resolvedServiceProvider; | ||
return scalerProvider.Value; | ||
}); | ||
// Wait for the async initialization to complete | ||
return scalerProviderTask.Value.GetAwaiter().GetResult(); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Blocking on an asynchronous call using GetAwaiter().GetResult() can introduce potential deadlocks; consider restructuring the service registration to use an async factory or alternative async initialization pattern if possible.
Copilot uses AI. Check for mistakes.
this._telemetryProps.AddConnectionProps(connection, serverProperties); | ||
await VerifyDatabaseSupported(connection, this._logger, cancellationToken); | ||
|
||
int userTableId = await GetUserTableIdAsync(connection, this._userTable, this._logger, cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed NOT doing this here and instead having the scaler return the total count of all changes if the tables don't exist. This would cause it to scale up, which would create the actual instance that would then create the tables as normal.
Is there a reason you decided not to do that? I'm not thrilled about having table creation happen in multiple places in our code. It makes it more likely that if we ever make any changes to how they're created that we'll miss updating one or the other and cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alrod Could you please explain why this is not possible? We discussed this and you suggested that this is the better route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would also work.
You need to catch an exception here (schema does not exist, GlobalStateTable does not exists or GlobalStateTableRow does not exists):
https://github.com/Azure/azure-functions-sql-extension/blob/main/src/TriggerBinding/SqlTriggerTargetScaler.cs#L29
and return:
return new TargetScalerResult
{
TargetWorkerCount = 1
};
Then a worker will be assigned, and listener will create required stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change I'd suggest making though is that we actually have the T-SQL query check if there's any changes on the table in those cases, and if there aren't we shouldn't scale up. No reason to spin stuff up unless there's actually changes IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it’s better for the SC to attempt creating the GlobalState, since we only try once.
Otherwise, we might end up with worker 1 being permanently assigned, as the SC will keep voting for it if GlobalState creation fails for any reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the worker gets in a bad state (e.g. if global state table creation fails in the worker it will throw an exception and exit) will the scaler make a new one? If so then keeping 1 worker around seems like the behavior we want then - since it'll just keep retrying to make the table until it succeeds - which we need for the trigger to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not want to keep a worker around just waiting until the issue with creation is fixed as it would affect customer's bill.
instead, we can try to create GlobalState from SC not on ScaleProver creation but each time when SC tries to get a vote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a worker just be "waiting around"? If it fails to create the global state table it will just throw and the worker will exit.
Having two separate pieces of functionality trying to work together to create the tables is a recipe for disaster and adds extra complication. What happens if we need to update how the table is created? Now we have to try and coordinate updating both the scale controller and the users app at the same time. And it severely limits our ability to make changes to the state tables.
How do other extensions handle this - do they already have the scale controller create necessary resources such as state tables? I believe the MySQL extension is doing something similar so presumably is in the same situation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to scale out only once right after monitoring started for MSSQL trigger - yes, the bad worker will exit and will never add again until customer explicitly changed the function app.
EventHubs have the same scenario - creating checkpoint blob to store position client position in the stream. The blob is created on SDK level by EventProcessorClient
. There is no such thing as SQL SDK for monitoring changes, so the code lives in MSSQL azure functions extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I am ok with to scale out 1 instance but only once.
@@ -126,5 +129,263 @@ internal static string GetBracketedLeasesTableName(string userDefinedLeasesTable | |||
return string.IsNullOrEmpty(userDefinedLeasesTableName) ? string.Format(CultureInfo.InvariantCulture, LeasesTableNameFormat, $"{userFunctionId}_{userTableId}") : | |||
string.Format(CultureInfo.InvariantCulture, UserDefinedLeasesTableNameFormat, $"{userDefinedLeasesTableName.AsBracketQuotedString()}"); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also want to add a method to SqlTriggerUtils
which will create everything in once?
using (SqlTransaction transaction = connection.BeginTransaction(System.Data.IsolationLevel.RepeatableRead))
{
createdSchemaDurationMs = await this.CreateSchemaAsync(connection, transaction, cancellationToken);
createGlobalStateTableDurationMs = await this.CreateGlobalStateTableAsync(connection, transaction, cancellationToken);
insertGlobalStateTableRowDurationMs = await this.InsertGlobalStateTableRowAsync(connection, transaction, userTableId, cancellationToken);
createLeasesTableDurationMs = await this.CreateLeasesTableAsync(connection, transaction, bracketedLeasesTableName, primaryKeyColumns, cancellationToken);
transaction.Commit();
}
Then we call a method from SqlTriggerListener/Start
and SqlTriggerMetricsProvider/GetUnprocessedChangeCountAsync
in case if GlobaState does not exist. Changes in SqlScalerProvider
and SqlBindingExtensions
are not needed in this case
Description
Added Async Initialization Support for SQL Scaler Provider
This pull request enhances the
SqlScalerProvider
by adding proper asynchronous initialization capabilities to ensure trigger metadata tables are available for any scaling SKU, even when the host does not start the listener.Changes
Added
InitializeAsync
method toSqlScalerProvider
Implemented
CreateAsync
static factory methodModified service registration in
SqlBindingExtension
AddSqlScaleForTrigger
to use the new async initialization patternLazy<Task<SqlScalerProvider>>
to ensure initialization happens only onceBenefits
This implementation ensures the SQL binding's scaling capabilities work reliably across all Azure Functions hosting scenarios.
Code Changes
az_func.GlobalState
table must be compatible with all prior versions of the extensionILogger
instance to log relevant information, especially information useful for debugging or troubleshootingasync
andawait
for all long-running operationsCancellationToken
Dependencies
dotnet restore --force-evaluate
to update the lock files and ensure that there are NO major versions updates in either src/packages.lock.json or Worker.Extensions.Sql/src/packages.lock.json. If there are, contact the dev team for instructions.Documentation